Some timing tests fail on ARM #1466

TerminadaPool · 2024-01-27T00:07:39Z

Context & versions

mithril: v2403.1
Platform: ARM64, https://www.solid-run.com/embedded-networking/nxp-lx2160a-family/cex7-lx2160/#carrier-boards

Steps to reproduce

make test

Actual behavior

test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds ... FAILED
test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ... FAILED

failures:

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-01-27T00:01:31.427344062+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.126}
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:173:9:
Not expected value in json output: {"timestamp": "2024-01-27T00:01:31.426460563+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.007}

failures:
utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds
utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED. 18 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.14s

Is this platform is too slow?

The text was updated successfully, but these errors were encountered:

jpraynaud · 2024-01-30T09:32:39Z

Same bug noticed on the Hydra CI:
https://github.com/input-output-hk/mithril/pull/1472/checks?check_run_id=21007590367

failures:

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-01-26T11:20:10.550875+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.125}
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED(B. 19 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.25s

error: test failed, to rerun pass `-p mithril-client-cli --lib`

jpraynaud · 2024-02-02T15:21:03Z

Same bug noticed on the GitHub Actions CI:
https://github.com/input-output-hk/mithril/actions/runs/7757675676/job/21157694463#step:5:851

     Summary [ 167.911s] 811 tests run: 810 passed (2 slow), 1 failed, 0 skipped
        FAIL [   0.010s] mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

--- STDOUT:              mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ---

running 1 test
test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ... FAILED

failures:

failures:
    utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.00s


--- STDERR:              mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ---
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-02-02T15:01:07.324388155+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.125}
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

TrevorBenson · 2024-05-04T23:48:36Z

@jpraynaud Is there any status update on arm64 based builds? If this bug is resolved will future Mithril releases include an arm64/aarch64 based release binary?

Thanks in advance.

Alenar · 2024-05-06T08:54:34Z

@TrevorBenson We did not progress on this issue so this won't be included in the next release.
Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

This bug is kind of surprising since the failing test use only static data:

    #[test]
    fn check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds() {
        let progress_bar = ProgressBar::new(10).with_elapsed(Duration::from_millis(5124));

        let json_string = ProgressBarJsonFormatter::format(&progress_bar);

        assert!(
            json_string.contains(r#""seconds_elapsed": 5.124"#),
            "Not expected value in json output: {}",
            json_string
        );
    }

And the format function only use straightforward calls (code simplified):

    pub fn format(progress_bar: &ProgressBar) -> String {
        format!(
            r#"{{""seconds_elapsed": {}.{:0>3}}}"#,
            progress_bar.elapsed().as_secs(), 
            progress_bar.elapsed().subsec_millis(),
        )
    }

So it's highly unlikely that's due to the host compute speed.

TrevorBenson · 2024-05-06T22:18:54Z

We did not progress on this issue so this won't be included in the next release. Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

Binaries for arm64 can be built on the x86_64 based github hosted runners, although requires extra actions/steps involving containers.

action: docker/setup-qemu-action
action: docker/build-push-action
- input: platforms: linux/arm64
extra step: start container & copy the binaries out for the artifact upload

The above is food for thought in case the need arises before arm64 hosted runners are available. There is no rush on my side to retool the CI, I produce arm64 binaries for my own use with this method when upstream does not.

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

I stopped running tests due to failed CI runs. I've created a testing branch and re-enabled the tests for all components. I'll track which tests fail for which component and report findings back later today.

TrevorBenson · 2024-05-06T22:51:48Z

We did not progress on this issue so this won't be included in the next release. Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

Binaries for arm64 can be built on the x86_64 based github hosted runners, although requires extra actions/steps involving containers.

action: docker/setup-qemu-action

action: docker/build-push-action

input: platforms: linux/arm64

extra step: start container & copy the binaries out for the artifact upload

I think I forgot to mention docker/setup-buildx-action is required to use docker/setup-qemu-action and platforms. The docs are not very explicit about it, but I recall build failures (not related to this ticket, just arm64 in general) when I disabled the buildx caches but tried to use qemu.

TrevorBenson · 2024-05-07T06:59:33Z

On an Ampere system:

Mithril Aggregator
- passes make test
- fails make debug with Error: configuration deserialize error
Mithril Signer
- passes make test
- fails make debug with Error: configuration deserialize error
Mithril Client
- fails make test with:

failures:

---- utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created stdout ----
thread 'utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created' panicked at mithril-client-cli/src/utils/cardano_db_download_checker.rs:253:18:
ensure_dir_exist should fail: ()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable stdout ----
thread 'utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable' panicked at mithril-client-cli/src/utils/cardano_db_download_checker.rs:282:14:
check_prerequisites should fail: ()


failures:
    utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created
    utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable

test result: FAILED. 22 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.18s

error: test failed, to rerun pass `--lib`
make: *** [Makefile:23: test] Error 101

All three succeed if only make build is run.

TrevorBenson · 2024-05-09T07:13:23Z

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

The tests you mentioned, check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds, did not fail. I forgot I had meant to follow up the following morning with a gist of the complete results of the test and failure. Here is the link to the gist

ch1bo · 2024-05-14T07:54:54Z

CC @locallycompact as you had been looking into arm64 builds for Hydra lately. Seems like we could mutualize any native builders and/or look into the qemu approach if cross compilation via nix proves to be annoying?

TrevorBenson · 2024-05-14T16:56:16Z

CC @locallycompact as you had been looking into arm64 builds for Hydra lately. Seems like we could mutualize any native builders and/or look into the qemu approach if cross compilation via nix proves to be annoying?

FWIW I have had success building with QEMU, but sometimes the make test or make debug have errors that don't occur when building on arm64 hardware or virtual machines. Just want to make sure you are aware since it is useful, but not as reliable as an actual arm64 runner should be.

sfauvel · 2024-05-15T08:04:27Z

Hello @TrevorBenson @TerminadaPool,
We have rewrite progress_reporter tests to make them less dependent on the execution environment.
We are not able to test them on ARM. We'll let you check that they pass before closing this issue.

For failing cardano_db_download_checker tests, it will be done in another issue. Thank you to detail in the ticket the specificities of this environment.

jpraynaud added the to-investigate 🔎 Need further investigation label Jan 29, 2024

jpraynaud assigned sfauvel May 13, 2024

sfauvel mentioned this issue May 13, 2024

Stabilize test removing most of the time dependency #1673

Merged

12 tasks

sfauvel closed this as completed in #1673 May 15, 2024

sfauvel reopened this May 15, 2024

jpraynaud closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some timing tests fail on ARM #1466

Some timing tests fail on ARM #1466

TerminadaPool commented Jan 27, 2024

jpraynaud commented Jan 30, 2024 •

edited

Loading

jpraynaud commented Feb 2, 2024 •

edited

Loading

TrevorBenson commented May 4, 2024

Alenar commented May 6, 2024

TrevorBenson commented May 6, 2024

TrevorBenson commented May 6, 2024

TrevorBenson commented May 7, 2024 •

edited

Loading

TrevorBenson commented May 9, 2024

ch1bo commented May 14, 2024

TrevorBenson commented May 14, 2024

sfauvel commented May 15, 2024

Some timing tests fail on ARM #1466

Some timing tests fail on ARM #1466

Comments

TerminadaPool commented Jan 27, 2024

Context & versions

Steps to reproduce

Actual behavior

jpraynaud commented Jan 30, 2024 • edited Loading

jpraynaud commented Feb 2, 2024 • edited Loading

TrevorBenson commented May 4, 2024

Alenar commented May 6, 2024

TrevorBenson commented May 6, 2024

TrevorBenson commented May 6, 2024

TrevorBenson commented May 7, 2024 • edited Loading

TrevorBenson commented May 9, 2024

ch1bo commented May 14, 2024

TrevorBenson commented May 14, 2024

sfauvel commented May 15, 2024

jpraynaud commented Jan 30, 2024 •

edited

Loading

jpraynaud commented Feb 2, 2024 •

edited

Loading

TrevorBenson commented May 7, 2024 •

edited

Loading