Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux musl tests on Arm #3882

Closed
wants to merge 6 commits into from
Closed

Add linux musl tests on Arm #3882

wants to merge 6 commits into from

Conversation

SeanTAllen
Copy link
Member

No description provided.

@SeanTAllen SeanTAllen requested a review from a team October 5, 2021 02:59
@SeanTAllen SeanTAllen force-pushed the aarch64-musl branch 3 times, most recently from 6d49c84 to c0d8814 Compare October 5, 2021 15:11
@jemc
Copy link
Member

jemc commented Oct 5, 2021

I notice that the tests didn't pass for musl.

@SeanTAllen
Copy link
Member Author

Yup. We appear to have a problem when running on musl.

@SeanTAllen SeanTAllen force-pushed the aarch64-musl branch 2 times, most recently from 91a496a to b55cd24 Compare October 5, 2021 18:50
@SeanTAllen
Copy link
Member Author

Setting up alpine on a Pi to test this looks pretty irritating. My first try might be to see if docker ran run on Raspbian.

@SeanTAllen
Copy link
Member Author

rebased against main. still need to investigate the cause of the musl failures.

@SeanTAllen
Copy link
Member Author

It looks like it is the test harness that is segfaulting. Interesting.

@SeanTAllen
Copy link
Member Author

The image used for this is based on Alpine 3.12 and needs to be updated to be based on 3.16.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jun 11, 2022
@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Jun 14, 2022
@SeanTAllen
Copy link
Member Author

@ergl could you try running this in docker on an arm machine to debug what is going on? it appears the runner crashes, i think, but i don't have any way to debug this easily.

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jul 8, 2022
@ergl
Copy link
Member

ergl commented Jul 9, 2022

@SeanTAllen Sure, I can try it tomorrow when I have some time off.

@ergl
Copy link
Member

ergl commented Jul 10, 2022

@SeanTAllen I managed to reproduce the segfault but unfortunately ran out of time to debug the problem. To trigger a segfault, it is enough to run the runner:

# ./build/build_debug/test/libponyc-run/runner/runner -h
Segmentation fault

I didn't have lldb installed on the Docker image, so I was out of luck. But that would be the next step.

@SeanTAllen
Copy link
Member Author

Thanks @ergl. Will you be doing the next step?

@ergl
Copy link
Member

ergl commented Jul 11, 2022

@SeanTAllen I can, but not until next week.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Jul 12, 2022
I was looking at #1206 and noticed
that when Benoit did the porting, that in a few places, he changed the
atomics usage unintentionally from what it was previously.

This commit reverts those (almost assuredly) inadvertent changes.

I can't guarantee that there are no other bits that were incorrect
in that commit, but I know these bits where.

I believe from a cursory glance that these could do "very bad things"
on weakly ordered memory platforms like Arm.
@ergl
Copy link
Member

ergl commented Jul 19, 2022

Here's the backtrace I get from the runner:

* thread #3, name = 'runner', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
  * frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
    frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
    frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
    frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
    frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
    frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
    frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
    frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
    frame #8: 0x0000aaaaaab0e1b4 runner`pony_error at posix_except.c:37:3
    frame #9: 0x0000aaaaaaae05d0 runner`___lldb_unnamed_symbol2258 + 1016
    frame #10: 0x0000aaaaaaade728 runner`___lldb_unnamed_symbol2250 + 840
    frame #11: 0x0000aaaaaaadfc20 runner`___lldb_unnamed_symbol2253 + 856
    frame #12: 0x0000aaaaaaae4c70 runner`___lldb_unnamed_symbol2286 + 2128
    frame #13: 0x0000aaaaaaad4f00 runner`Main_Dispatch + 92
    frame #14: 0x0000aaaaaab03e90 runner`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
    frame #15: 0x0000aaaaaab035d0 runner`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
    frame #16: 0x0000aaaaaab1383c runner`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
    frame #17: 0x0000aaaaaab12d14 runner`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
    frame #18: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
    frame #19: 0x0000fffff7fad418 ld-musl-aarch64.so.1
    frame #20: 0x0000aaaaaab1c654 runner`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
    frame #21: 0x0000aaaaaab12c3c runner`ponyint_sched_start(library=false) at scheduler.c:1210:9
    frame #22: 0x0000aaaaaab15008 runner`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc18) at start.c:332:7
    frame #23: 0x0000aaaaaab033c0 runner`main + 240
    frame #24: 0x0000fffff7f73274 ld-musl-aarch64.so.1

It seems like the source of the failure is here: pony_error at posix_except.c:37:3 which points to this:

_Unwind_RaiseException(&exception);

So it seems like we're doing something wrong when it comes to exception unwinding. This can be verified with this minimal Pony program that also causes the segfault:

actor Main
  new create(env: Env) =>
    try
      error
    end

The bactrace for the above program:

* thread #3, name = 'crash_example', stop reason = signal SIGSEGV: invalid address (fault address: 0xaaaaaaa8ff0b)
  * frame #0: 0x0000fffff7fab8e4 ld-musl-aarch64.so.1`strlen + 56
    frame #1: 0x0000fffff7f0958c libgcc_s.so.1`___lldb_unnamed_symbol238 + 28
    frame #2: 0x0000fffff7f096f4 libgcc_s.so.1`___lldb_unnamed_symbol239 + 100
    frame #3: 0x0000fffff7f0a788 libgcc_s.so.1`___lldb_unnamed_symbol245 + 760
    frame #4: 0x0000fffff7f0afdc libgcc_s.so.1`_Unwind_Find_FDE + 344
    frame #5: 0x0000fffff7f076e0 libgcc_s.so.1`___lldb_unnamed_symbol226 + 80
    frame #6: 0x0000fffff7f08658 libgcc_s.so.1`___lldb_unnamed_symbol229 + 84
    frame #7: 0x0000fffff7f08ce0 libgcc_s.so.1`_Unwind_RaiseException + 92
    frame #8: 0x0000aaaaaaab59c0 crash_example`pony_error at posix_except.c:37:3
    frame #9: 0x0000aaaaaaaaa4f0 crash_example`Main_Dispatch + 56
    frame #10: 0x0000aaaaaaaac5e4 crash_example`handle_message(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, msg=0x0000fffff7efab40) at actor.c:400:7
    frame #11: 0x0000aaaaaaaabd24 crash_example`ponyint_actor_run(ctx=0x0000fffff7efa048, actor=0x0000fffff7ef8c00, polling=false) at actor.c:486:8
    frame #12: 0x0000aaaaaaabaa6c crash_example`run(sched=0x0000fffff7efa000) at scheduler.c:984:23
    frame #13: 0x0000aaaaaaab9f44 crash_example`run_thread(arg=0x0000fffff7efa000) at scheduler.c:1035:3
    frame #14: 0x0000fffff7faebc0 ld-musl-aarch64.so.1
    frame #15: 0x0000fffff7fad418 ld-musl-aarch64.so.1
    frame #16: 0x0000aaaaaaac450c crash_example`ponyint_thread_create(thread=<unavailable>, start=<unavailable>, cpu=<unavailable>, arg=<unavailable>) at threads.c:210:6
    frame #17: 0x0000aaaaaaab9e6c crash_example`ponyint_sched_start(library=false) at scheduler.c:1210:9
    frame #18: 0x0000aaaaaaabc238 crash_example`pony_start(library=false, exit_code=0x0000000000000000, language_features=0x0000fffffffffc68) at start.c:332:7
    frame #19: 0x0000aaaaaaaabbc0 crash_example`main + 240
    frame #20: 0x0000fffff7f73274 ld-musl-aarch64.so.1

Edit: this was tested against the latest commit of this branch (1e82be3)

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Jul 19, 2022
@SeanTAllen
Copy link
Member Author

Given this only happens on Arm musl, I don't feel confident saying that we are doing something wrong.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Sep 2, 2022
@SeanTAllen
Copy link
Member Author

Closing as this requires cirrus CI that we are moving away from

@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Aug 15, 2023
@SeanTAllen SeanTAllen closed this Aug 15, 2023
@SeanTAllen SeanTAllen deleted the aarch64-musl branch August 15, 2023 22:57
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge This PR should not be merged at this time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants