Skip to content

Add Timestamp_Secs to V3 report + timestamp-based windowing (#581)#581

Open
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
marziehlenjaniMeta:export-D101064874-to-v2-beta
Open

Add Timestamp_Secs to V3 report + timestamp-based windowing (#581)#581
marziehlenjaniMeta wants to merge 1 commit intofacebookresearch:v2-betafrom
marziehlenjaniMeta:export-D101064874-to-v2-beta

Conversation

@marziehlenjaniMeta
Copy link
Copy Markdown

@marziehlenjaniMeta marziehlenjaniMeta commented Apr 15, 2026

Summary:

Two related fixes to ensure perfpub selects the correct time window for
perf-collector-timeseries CSVs across all platforms.

1. V3 report generator: add missing Timestamp_Secs column

The Neoverse V3 report generator (generate_arm_neoversev3_perf_report.py)
defines a timestamp() function that emits a Timestamp_Secs column, but
never calls it in the metrics list. This column is already emitted by the
V2/Grace and AMD report generators. Without it, perfpub cannot do
timestamp-based row selection for V3 CSVs and must fall back to row-count
arithmetic.

Note: the index column in perf-collector-timeseries CSVs is a raw
DataFrame row number (jumps by 67-94 per row due to the multi-event-per-
timestamp structure of perf stat output), NOT a timestamp. Each row is
actually ~5 seconds apart. The Timestamp_Secs column provides the real
timestamp.

2. perfpub: use timestamp-based windowing for --last-secs/--skip-last-secs

When --last-secs and --skip-last-secs are specified, get_start_end_index()
previously used row-count arithmetic (len(df) - ceil(last_secs / interval)),
which depends on all CSVs having the same collection interval. This is fragile:
if different CSVs start collection at different times or have different
intervals, the row-count approach selects different absolute time windows.

Now, for CSVs with relative_secs or epoch_secs timestamp columns,
perfpub computes the target time window from the last timestamp
(end = last_ts - skip_last_secs, start = end - last_secs) and finds
the closest matching rows. This reuses the same idxmin() pattern already
used for the breakdown.csv path, ensuring all CSVs cover the same absolute
time window.

The row-count arithmetic is preserved as a fallback for time_of_day
CSVs (mpstat, memstat, etc.) and for old data lacking timestamp columns.

Differential Revision: D101064874

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Apr 15, 2026

@marziehlenjaniMeta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101064874.

@meta-codesync meta-codesync Bot changed the title Add Timestamp_Secs to V3 report + timestamp-based windowing Add Timestamp_Secs to V3 report + timestamp-based windowing (#581) Apr 15, 2026
marziehlenjaniMeta added a commit to marziehlenjaniMeta/DCPerf that referenced this pull request Apr 15, 2026
…research#581)

Summary:

Two related fixes to ensure perfpub selects the correct time window for
perf-collector-timeseries CSVs across all platforms.

**1. V3 report generator: add missing Timestamp_Secs column**

The Neoverse V3 report generator (`generate_arm_neoversev3_perf_report.py`)
defines a `timestamp()` function that emits a `Timestamp_Secs` column, but
never calls it in the metrics list. This column is already emitted by the
V2/Grace and AMD report generators. Without it, perfpub cannot do
timestamp-based row selection for V3 CSVs and must fall back to row-count
arithmetic.

Note: the `index` column in perf-collector-timeseries CSVs is a raw
DataFrame row number (jumps by 67-94 per row due to the multi-event-per-
timestamp structure of perf stat output), NOT a timestamp. Each row is
actually ~5 seconds apart. The `Timestamp_Secs` column provides the real
timestamp.

**2. perfpub: use timestamp-based windowing for --last-secs/--skip-last-secs**

When `--last-secs` and `--skip-last-secs` are specified, `get_start_end_index()`
previously used row-count arithmetic (`len(df) - ceil(last_secs / interval)`),
which depends on all CSVs having the same collection interval. This is fragile:
if different CSVs start collection at different times or have different
intervals, the row-count approach selects different absolute time windows.

Now, for CSVs with `relative_secs` or `epoch_secs` timestamp columns,
perfpub computes the target time window from the last timestamp
(`end = last_ts - skip_last_secs`, `start = end - last_secs`) and finds
the closest matching rows. This reuses the same `idxmin()` pattern already
used for the breakdown.csv path, ensuring all CSVs cover the same absolute
time window.

The row-count arithmetic is preserved as a fallback for `time_of_day`
CSVs (mpstat, memstat, etc.) and for old data lacking timestamp columns.

Differential Revision: D101064874
@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D101064874-to-v2-beta branch from 7c6b236 to 62ac571 Compare April 15, 2026 22:57
…research#581)

Summary:
Pull Request resolved: facebookresearch#581

Two related fixes to ensure perfpub selects the correct time window for
perf-collector-timeseries CSVs across all platforms.

**1. V3 report generator: add missing Timestamp_Secs column**

The Neoverse V3 report generator (`generate_arm_neoversev3_perf_report.py`)
defines a `timestamp()` function that emits a `Timestamp_Secs` column, but
never calls it in the metrics list. This column is already emitted by the
V2/Grace and AMD report generators. Without it, perfpub cannot do
timestamp-based row selection for V3 CSVs and must fall back to row-count
arithmetic.

Note: the `index` column in perf-collector-timeseries CSVs is a raw
DataFrame row number (jumps by 67-94 per row due to the multi-event-per-
timestamp structure of perf stat output), NOT a timestamp. Each row is
actually ~5 seconds apart. The `Timestamp_Secs` column provides the real
timestamp.

**2. perfpub: use timestamp-based windowing for --last-secs/--skip-last-secs**

When `--last-secs` and `--skip-last-secs` are specified, `get_start_end_index()`
previously used row-count arithmetic (`len(df) - ceil(last_secs / interval)`),
which depends on all CSVs having the same collection interval. This is fragile:
if different CSVs start collection at different times or have different
intervals, the row-count approach selects different absolute time windows.

Now, for CSVs with `relative_secs` or `epoch_secs` timestamp columns,
perfpub computes the target time window from the last timestamp
(`end = last_ts - skip_last_secs`, `start = end - last_secs`) and finds
the closest matching rows. This reuses the same `idxmin()` pattern already
used for the breakdown.csv path, ensuring all CSVs cover the same absolute
time window.

The row-count arithmetic is preserved as a fallback for `time_of_day`
CSVs (mpstat, memstat, etc.) and for old data lacking timestamp columns.

Differential Revision: D101064874
@marziehlenjaniMeta marziehlenjaniMeta force-pushed the export-D101064874-to-v2-beta branch from 62ac571 to b837daa Compare April 15, 2026 23:00
meta-codesync Bot pushed a commit that referenced this pull request Apr 15, 2026
Summary:
Pull Request resolved: #581

Two related fixes to ensure perfpub selects the correct time window for
perf-collector-timeseries CSVs across all platforms.

**1. V3 report generator: add missing Timestamp_Secs column**

The Neoverse V3 report generator (`generate_arm_neoversev3_perf_report.py`)
defines a `timestamp()` function that emits a `Timestamp_Secs` column, but
never calls it in the metrics list. This column is already emitted by the
V2/Grace and AMD report generators. Without it, perfpub cannot do
timestamp-based row selection for V3 CSVs and must fall back to row-count
arithmetic.

Note: the `index` column in perf-collector-timeseries CSVs is a raw
DataFrame row number (jumps by 67-94 per row due to the multi-event-per-
timestamp structure of perf stat output), NOT a timestamp. Each row is
actually ~5 seconds apart. The `Timestamp_Secs` column provides the real
timestamp.

**2. perfpub: use timestamp-based windowing for --last-secs/--skip-last-secs**

When `--last-secs` and `--skip-last-secs` are specified, `get_start_end_index()`
previously used row-count arithmetic (`len(df) - ceil(last_secs / interval)`),
which depends on all CSVs having the same collection interval. This is fragile:
if different CSVs start collection at different times or have different
intervals, the row-count approach selects different absolute time windows.

Now, for CSVs with `relative_secs` or `epoch_secs` timestamp columns,
perfpub computes the target time window from the last timestamp
(`end = last_ts - skip_last_secs`, `start = end - last_secs`) and finds
the closest matching rows. This reuses the same `idxmin()` pattern already
used for the breakdown.csv path, ensuring all CSVs cover the same absolute
time window.

The row-count arithmetic is preserved as a fallback for `time_of_day`
CSVs (mpstat, memstat, etc.) and for old data lacking timestamp columns.

Reviewed By: charles-typ

Differential Revision: D101064874

fbshipit-source-id: 520d98106347836e32bc29502dfa4d57bb66a7af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant