Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some timing tests fail on ARM #1466

Open
TerminadaPool opened this issue Jan 27, 2024 · 11 comments · Fixed by #1673
Open

Some timing tests fail on ARM #1466

TerminadaPool opened this issue Jan 27, 2024 · 11 comments · Fixed by #1673
Assignees
Labels
to-investigate 🔎 Need further investigation

Comments

@TerminadaPool
Copy link

Context & versions

mithril: v2403.1
Platform: ARM64, https://www.solid-run.com/embedded-networking/nxp-lx2160a-family/cex7-lx2160/#carrier-boards

Steps to reproduce

make test

Actual behavior

test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds ... FAILED
test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ... FAILED

failures:

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-01-27T00:01:31.427344062+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.126}
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:173:9:
Not expected value in json output: {"timestamp": "2024-01-27T00:01:31.426460563+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.007}

failures:
utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_less_than_100_milliseconds
utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED. 18 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.14s

Is this platform is too slow?

@jpraynaud jpraynaud added the to-investigate 🔎 Need further investigation label Jan 29, 2024
@jpraynaud
Copy link
Member

jpraynaud commented Jan 30, 2024

Same bug noticed on the Hydra CI:
https://github.com/input-output-hk/mithril/pull/1472/checks?check_run_id=21007590367

failures:

---- utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds stdout ----
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-01-26T11:20:10.550875+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.125}
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED(B. 19 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.25s

error: test failed, to rerun pass `-p mithril-client-cli --lib`

@jpraynaud
Copy link
Member

jpraynaud commented Feb 2, 2024

Same bug noticed on the GitHub Actions CI:
https://github.com/input-output-hk/mithril/actions/runs/7757675676/job/21157694463#step:5:851

     Summary [ 167.911s] 811 tests run: 810 passed (2 slow), 1 failed, 0 skipped
        FAIL [   0.010s] mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

--- STDOUT:              mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ---

running 1 test
test utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ... FAILED

failures:

failures:
    utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 19 filtered out; finished in 0.00s


--- STDERR:              mithril-client-cli utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds ---
thread 'utils::progress_reporter::tests::check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds' panicked at mithril-client-cli/src/utils/progress_reporter.rs:160:9:
Not expected value in json output: {"timestamp": "2024-02-02T15:01:07.324388155+00:00", "bytes_downloaded": 0, "bytes_total": 10, "seconds_left": 0.000, "seconds_elapsed": 5.125}
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@TrevorBenson
Copy link
Contributor

@jpraynaud Is there any status update on arm64 based builds? If this bug is resolved will future Mithril releases include an arm64/aarch64 based release binary?

Thanks in advance.

@Alenar
Copy link
Collaborator

Alenar commented May 6, 2024

@TrevorBenson We did not progress on this issue so this won't be included in the next release.
Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

This bug is kind of surprising since the failing test use only static data:

    #[test]
    fn check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds() {
        let progress_bar = ProgressBar::new(10).with_elapsed(Duration::from_millis(5124));

        let json_string = ProgressBarJsonFormatter::format(&progress_bar);

        assert!(
            json_string.contains(r#""seconds_elapsed": 5.124"#),
            "Not expected value in json output: {}",
            json_string
        );
    }

And the format function only use straightforward calls (code simplified):

    pub fn format(progress_bar: &ProgressBar) -> String {
        format!(
            r#"{{""seconds_elapsed": {}.{:0>3}}}"#,
            progress_bar.elapsed().as_secs(), 
            progress_bar.elapsed().subsec_millis(),
        )
    }

So it's highly unlikely that's due to the host compute speed.

@TrevorBenson
Copy link
Contributor

We did not progress on this issue so this won't be included in the next release. Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

Binaries for arm64 can be built on the x86_64 based github hosted runners, although requires extra actions/steps involving containers.

  • action: docker/setup-qemu-action
  • action: docker/build-push-action
    • input: platforms: linux/arm64
  • extra step: start container & copy the binaries out for the artifact upload

The above is food for thought in case the need arises before arm64 hosted runners are available. There is no rush on my side to retool the CI, I produce arm64 binaries for my own use with this method when upstream does not.

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

I stopped running tests due to failed CI runs. I've created a testing branch and re-enabled the tests for all components. I'll track which tests fail for which component and report findings back later today.

@TrevorBenson
Copy link
Contributor

We did not progress on this issue so this won't be included in the next release. Also we won't provide arm64/aarch64 based build for the time being, we are using github action hosted runners to build our binaries and there's no arm64 runner available right now (github announced a private beta late october 2023 but I don't know more).

Binaries for arm64 can be built on the x86_64 based github hosted runners, although requires extra actions/steps involving containers.

  • action: docker/setup-qemu-action

  • action: docker/build-push-action

    • input: platforms: linux/arm64
  • extra step: start container & copy the binaries out for the artifact upload

I think I forgot to mention docker/setup-buildx-action is required to use docker/setup-qemu-action and platforms. The docs are not very explicit about it, but I recall build failures (not related to this ticket, just arm64 in general) when I disabled the buildx caches but tried to use qemu.

@TrevorBenson
Copy link
Contributor

TrevorBenson commented May 7, 2024

On an Ampere system:

  • Mithril Aggregator
    • passes make test
    • fails make debug with Error: configuration deserialize error
  • Mithril Signer
    • passes make test
    • fails make debug with Error: configuration deserialize error
  • Mithril Client
    • fails make test with:
failures:

---- utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created stdout ----
thread 'utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created' panicked at mithril-client-cli/src/utils/cardano_db_download_checker.rs:253:18:
ensure_dir_exist should fail: ()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable stdout ----
thread 'utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable' panicked at mithril-client-cli/src/utils/cardano_db_download_checker.rs:282:14:
check_prerequisites should fail: ()


failures:
    utils::cardano_db_download_checker::test::unix_only::return_error_if_directory_could_not_be_created
    utils::cardano_db_download_checker::test::unix_only::return_error_if_existing_directory_is_not_writable

test result: FAILED. 22 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.18s

error: test failed, to rerun pass `--lib`
make: *** [Makefile:23: test] Error 101

All three succeed if only make build is run.

@TrevorBenson
Copy link
Contributor

About this issue do this test still fails on your ends? If yes does it fails every time or only sporadically? The reproduction is extremely rare on our CI but they're all x86 based.

The tests you mentioned, check_seconds_elapsed_in_json_report_with_more_than_100_milliseconds, did not fail. I forgot I had meant to follow up the following morning with a gist of the complete results of the test and failure. Here is the link to the gist

@ch1bo
Copy link
Member

ch1bo commented May 14, 2024

CC @locallycompact as you had been looking into arm64 builds for Hydra lately. Seems like we could mutualize any native builders and/or look into the qemu approach if cross compilation via nix proves to be annoying?

@TrevorBenson
Copy link
Contributor

CC @locallycompact as you had been looking into arm64 builds for Hydra lately. Seems like we could mutualize any native builders and/or look into the qemu approach if cross compilation via nix proves to be annoying?

FWIW I have had success building with QEMU, but sometimes the make test or make debug have errors that don't occur when building on arm64 hardware or virtual machines. Just want to make sure you are aware since it is useful, but not as reliable as an actual arm64 runner should be.

@sfauvel sfauvel reopened this May 15, 2024
@sfauvel
Copy link
Collaborator

sfauvel commented May 15, 2024

Hello @TrevorBenson @TerminadaPool,
We have rewrite progress_reporter tests to make them less dependent on the execution environment.
We are not able to test them on ARM. We'll let you check that they pass before closing this issue.

For failing cardano_db_download_checker tests, it will be done in another issue. Thank you to detail in the ticket the specificities of this environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to-investigate 🔎 Need further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants