`cargo test` performance regression on Windows on version 1.75.0 #119560

Jefffrey · 2024-01-03T21:22:06Z

Code

DataFusion issue with complete details: apache/datafusion#8696

arrow-datafusion runs cargo test on Windows runner as part of CI. When on Rust version 1.74.1 (and below), the check takes under 30 minutes. After upgrading to Rust version 1.75.0, it now takes over 3 hours, with no other change in code on our side. This seems to only take effect on Windows, as Linux/Mac tests didn't seem to be affected.

After debugging, I found the regression occurs between toolchains nightly-2023-10-29 (rust e5cfc5547) and nightly-2023-10-30 (rust 608e9682f).

We are running on GitHub actions runner windows-latest.

So expected run is here, on toolchain nightly-2023-10-29: https://github.com/apache/arrow-datafusion/actions/runs/7394674719/job/20116418078

Commit: apache/datafusion@c85e67c
Duration of 23m 43s

When I bump to toolchain nightly-2023-10-30, with no other code changes: https://github.com/apache/arrow-datafusion/actions/runs/7394848426/job/20116910586

Commit: apache/datafusion@2e99e21
Duration of 3h 22m 44s

Slow tests

The slowness occurs primarily in two tests.

tpcds_planning

Test code: https://github.com/apache/arrow-datafusion/blob/1179a76567892b259c88f08243ee01f05c4c3d5c/datafusion/core/tests/tpcds_planning.rs

On the good run (before regression):

Log: https://github.com/apache/arrow-datafusion/actions/runs/7394674719/job/20116418078#step:5:1886
Duration: 8m 15s

On the bad run (after regression):

Log: https://github.com/apache/arrow-datafusion/actions/runs/7394848426/job/20116910586#step:5:1888
Duration: 1h 24m 59s

sqllogictest

Test code: https://github.com/apache/arrow-datafusion/tree/1179a76567892b259c88f08243ee01f05c4c3d5c/datafusion/sqllogictest

On the good run (before regression):

Log: https://github.com/apache/arrow-datafusion/actions/runs/7394674719/job/20116418078#step:5:5691
Duration: 6m 3s (calculated from the timestamps from start of run, until start of next test run)

On the bad run (after regression):

Log: https://github.com/apache/arrow-datafusion/actions/runs/7394848426/job/20116910586#step:5:5717
Duration: 1h 46m

These two tests are the only ones with a significant delta, the rest don't seem affected by the upgrade.

Version it worked on

Ran fast with Rust 1.74.1 (and nightly-2023-10-29)

Version with regression

Ran slow on Rust 1.75.0 (and nightly-2023-10-30)

Additional context

Apologies if the example is too large to easily determine where the issue is. I'll try to reduce this to a smaller MRE, as I don't have a Windows machine to locally test on, so have had to check via CI.

The text was updated successfully, but these errors were encountered:

comphead · 2024-01-03T21:33:58Z

Maybe this also helps a bit to investigate, if we run datafusion windows CI with

RUSTFLAGS: "-C debuginfo=0 -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256"

it helps to reduce timing from 190m to 60m, but it still way longer than before

ChrisDenton · 2024-01-03T21:59:16Z

It might help to look at the raw log and narrow down which steps specifically are taking a long time.

So the issue is maybe around the generation of debuginfo?

Jefffrey · 2024-01-04T00:04:55Z

Just a quick update, narrowed down the testing:

Limit to running only tpcds_physical_q44 with backtrace feature:

cargo test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
Where our backtrace feature will enable backtrace from std:
- https://github.com/apache/arrow-datafusion/blob/ca260d99f17ef667b7f06d2da4a67255d27c94a9/datafusion/common/src/error.rs#L18-L20
- https://github.com/apache/arrow-datafusion/blob/ca260d99f17ef667b7f06d2da4a67255d27c94a9/datafusion/common/src/error.rs#L429-L451

good (2023-10-29 nightly) = 70 seconds

bad (2023-10-30 nightly) = 692 seconds

Without backtrace:

cargo test -p datafusion --test tpcds_planning tpcds_physical_q44

good (2023-10-29 nightly) = 1.25 seconds

bad (2023-10-30 nightly) = 1.26 seconds

Edit:

Also just to be clear, we are running with env vars:

RUSTFLAGS: "-C debuginfo=0"
RUST_BACKTRACE: "1"

For both good & bad runs.

Maybe this also helps a bit to investigate, if we run datafusion windows CI with
RUSTFLAGS: "-C debuginfo=0 -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256"
it helps to reduce timing from 190m to 60m, but it still way longer than before

The speedup seems to occur due to adding -C opt-level=1 -C target-feature=+crt-static -C incremental=false -C codegen-units=256 flags as the debuginfo flag was already set previously

saethlin · 2024-01-04T00:51:02Z

#117089 was merged in the right window, and the PR description notes that it contains rust-lang/backtrace-rs#569 which sounds relevant.

apiraino · 2024-01-04T11:18:24Z

cc @wesleywiser who authored #117089 and changes to the backtrace-rs submodule (IIUC)

wesleywiser · 2024-01-06T02:50:02Z

I'm able to repro locally:

> $env:RUSTFLAGS="-Cdebuginfo=0"
>  cargo +nightly-2023-10-29 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
test tpcds_physical_q44 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 197 filtered out; finished in 36.23s

> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
running 1 test
test tpcds_physical_q44 has been running for over 60 seconds

Looking at profiling data, the vast majority of time in the regressed case is spent in dbghelp!SymAddrIncludeInlineTrace which is a necessary call I introduced in rust-lang/backtrace-rs#569 when symbolizing the backtrace. I'm investigating if there is some other way to avoid calling that function but haven't had any luck.

One thing that's very interesting is that if do not set RUSTFLAGS="-Cdebuginfo=0", the regression disappears. This suggests to me we're hitting a pathological issue in dbghelp when there isn't any debug info present. On Windows, debug info is basically required to get any kind of meaningful backtrace so setting both -Cdebuginfo=0 and --feature backtrace in the test case doesn't make much sense to me.

@Jefffrey do you know why -Cdebuginfo=0 is being set? My assumption is that the intent is to make builds faster while still being able to test --feature backtrace. If that's the case, I would suggest actually setting -Cdebuginfo=line-tables-only which generates the minimal amount of information to give proper backtraces and doesn't have much impact on compile times (especially in comparison to -Cdebuginfo=2 which is the default in this case):

> $env:RUSTFLAGS="-Cdebuginfo=0"
>  cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
    Finished test [unoptimized + debuginfo] target(s) in 1m 25s

> $env:RUSTFLAGS="-Cdebuginfo=line-tables-only"
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
    Finished test [unoptimized + debuginfo] target(s) in 1m 32s

> $env:RUSTFLAGS="-Cdebuginfo=2"
> cargo +nightly-2023-10-30 test -p datafusion --test tpcds_planning tpcds_physical_q44 --features backtrace
...
    Finished test [unoptimized + debuginfo] target(s) in 2m 26s

Jefffrey · 2024-01-06T04:36:04Z

Thank you for the detailed breakdown! @wesleywiser

@Jefffrey do you know why -Cdebuginfo=0 is being set? My assumption is that the intent is to make builds faster while still being able to test --feature backtrace. If that's the case, I would suggest actually setting -Cdebuginfo=line-tables-only which generates the minimal amount of information to give proper backtraces and doesn't have much impact on compile times (especially in comparison to -Cdebuginfo=2 which is the default in this case):

The reasoning you provide makes sense, I'm not sure why both debuginfo=0 was set along with backtrace (probably had debuginfo=0 set first then backtrace feature was added as a separate feature so needed to be added to test).

There was a comment stating do not produce debug symbols to keep memory usage down for the debuginfo=0 setting fwiw, so I guess that's the main reason

I'll experiment with setting debuginfo=line-tables-only and will see if that helps speed things back up 👍

Jefffrey · 2024-01-06T10:22:35Z

Thanks for the config fix suggested @wesleywiser

This does eliminate the performance regression (or at least bring the times within a more reasonable range of each other).

Even though it was caused by some weird settings on our end (disabling debuginfo but still using backtrace, on Windows), it was still a significant regression in performance. Let me know if this is the expected behaviour now and I can close this issue, or to keep this open if you're planning to investigate this weird edge case further 👍 (or move it to backtrace repo?)

ChrisDenton · 2024-02-28T18:11:31Z

This was discussed in the libs meeting. While this is an edge case and a likely bug in dbghlp, it was suggested that maybe backtrace could workaround it by somehow detecting if there is no debug info present? Even if it can only tell after trying, it could perhaps cache this knowledge so it doesn't try again the next time?

Jefffrey added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 3, 2024

rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 3, 2024

This was referenced Jan 3, 2024

Windows CI test is extremely long (~3 hours) apache/datafusion#8696

Closed

[CI] Improve windows machine CI test time apache/datafusion#8730

Merged

apiraino removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jan 4, 2024

comphead mentioned this issue Jan 4, 2024

Add schema_err! error macros with optional backtrace apache/datafusion#8620

Merged

Jefffrey mentioned this issue Jan 6, 2024

ci: speed up win64 test apache/datafusion#8728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cargo test` performance regression on Windows on version 1.75.0 #119560

`cargo test` performance regression on Windows on version 1.75.0 #119560

Jefffrey commented Jan 3, 2024

comphead commented Jan 3, 2024 •

edited

ChrisDenton commented Jan 3, 2024

Jefffrey commented Jan 4, 2024 •

edited

saethlin commented Jan 4, 2024

apiraino commented Jan 4, 2024

wesleywiser commented Jan 6, 2024 •

edited

Jefffrey commented Jan 6, 2024

Jefffrey commented Jan 6, 2024

ChrisDenton commented Feb 28, 2024

cargo test performance regression on Windows on version 1.75.0 #119560

cargo test performance regression on Windows on version 1.75.0 #119560

Comments

Jefffrey commented Jan 3, 2024

Code

Slow tests

tpcds_planning

sqllogictest

Version it worked on

Version with regression

Additional context

comphead commented Jan 3, 2024 • edited

ChrisDenton commented Jan 3, 2024

Jefffrey commented Jan 4, 2024 • edited

saethlin commented Jan 4, 2024

apiraino commented Jan 4, 2024

wesleywiser commented Jan 6, 2024 • edited

Jefffrey commented Jan 6, 2024

Jefffrey commented Jan 6, 2024

ChrisDenton commented Feb 28, 2024

`cargo test` performance regression on Windows on version 1.75.0 #119560

`cargo test` performance regression on Windows on version 1.75.0 #119560

comphead commented Jan 3, 2024 •

edited

Jefffrey commented Jan 4, 2024 •

edited

wesleywiser commented Jan 6, 2024 •

edited