Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERL-79: Erlang VM segfaults when running tests which perform at lot of code loading/unloading #3395

Closed
OTP-Maintainer opened this issue Jan 22, 2016 · 11 comments
Assignees
Labels
bug Issue is reported as a bug priority:medium team:VM Assigned to OTP team VM
Milestone

Comments

@OTP-Maintainer
Copy link

Original reporter: chrisyunker
Affected versions: OTP-18.2.1, R16B
Fixed in version: OTP-18.3
Component: erts
Migrated from: https://bugs.erlang.org/browse/ERL-79


We've been having issues with segfaults on mac and linux. This occurs when we run our integration tests which perform a lot of code loading and unloading, so it may be related to that. We generally don't see this in production, only the occasional crash.

We've had this issue on R16B and 18.2.1. And we've seen this in Mac OS and linux (centos 6).

We have a coredump file, but since it contains proprietary data, don't want to share that publicly. We're happy to share it with OTP team only. Please let me know.

The segfaults are always in this location (at least since we've been checking):


{code:erlang}
(lldb) thread select 133
* thread #133: tid = 0x0084, 0x0000000012542048 beam.smp`erts_lookup_function_info [inlined] lookup_loc at beam_ranges.c:311, stop reason = signal SIGSTOP
    frame #0: 0x0000000012542048 beam.smp`erts_lookup_function_info [inlined] lookup_loc at beam_ranges.c:311
   308 	    }
   309
   310 	    pc = (Eterm) (BeamInstr) orig_pc;
-> 311 	    fi->fname_ptr = (Eterm *) (BeamInstr) line[MI_LINE_FNAME_PTR];
   312 	    low = (Eterm *) (BeamInstr) line[MI_LINE_FUNC_TAB+idx];
   313 	    high = (Eterm *) (BeamInstr) line[MI_LINE_FUNC_TAB+idx+1];
   314 	    while (high > low) {
{code}

Please let me know if you need any more information,

Thanks

@OTP-Maintainer
Copy link
Author

sverker said:

I think the first step to understand this crash is a full coredump. If I can choose, a Linux dump is preferred.

Another things you could do is to run with debug built emulator and see if you get an ever better crash.

cd $ERL_TOP/erts/emulator && make TYPE=debug smp

Either run with $ERL_TOP/cerl -debug
or cp $ERL_TOP/bin/<target>/beam.debug.smp $RELEASE_ROOT/erts-X.Y/bin/beam.smp
and cp $ERL_TOP/bin/<target>/child_setup.debug $RELEASE_ROOT/erts-X.Y/bin/


/Sverker, Erlang/OTP Ericsson (sverker@erlang.org)

@OTP-Maintainer
Copy link
Author

chrisyunker said:

Sounds good, will do. Is there a private place I can upload the coredump to?

Thanks

@OTP-Maintainer
Copy link
Author

sverker said:

Currently we have no good private upload area ready to use.

Could you instead put the coredump somewhere where I can download it
with some satisfactory level of security. I could give you a public rsa key
to encrypt the file with for example.
 

@OTP-Maintainer
Copy link
Author

chrisyunker said:

Yes, I can do that if you give me an RSA pubilc key.

Thanks

@OTP-Maintainer
Copy link
Author

sverker said:

{noformat}
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAwFZdHCdTVDsElqt5JYK6
ijL83lhq1twb1f14M6rEbqjmhzGXs9RoKdCkruRUVHT/zVqwNEq5B8H4iQTsN4J6
BxGTeV5yqwIQBzsl2XIuKPu8zbC11EtAcRJmbCmCbFjRI8+q4i7ULmn46RQl6gGa
CA5aRCFg2E4HC/jYUjcrrxF9VGULfCYG3cLeHMsFavRKslJ/hhLwLy9JzcOv5EBx
SeX5xxvMQGoIlB36+0k9y/NqU6F5pLwAeQdB1Q6+AfAPV6NvymGa/cskWu7FprBo
3Ut/5xIzJynXcifxBj6sAACS8Fx0TzHRyTkB5O5DMjJSPDcAO9GFp3vYbtOT8DNd
cQIDAQAB
-----END PUBLIC KEY-----
{noformat}

(make sure to get rid of any extra linefeeds when copy-pasting the key above)

Here are some simple enough steps to follow if you want: [http://www.czeskis.com/random/openssl-encrypt-file.html]


@OTP-Maintainer
Copy link
Author

chrisyunker said:

Put the files here: https://www.dropbox.com/sh/5onzgl8shn1vm9k/AADh2ivNCe3c7v6dSoYeI7GZa?dl=0

MD5 (core-beam-11-500-500-3749-1453849260.enc) = e0820813321f63792557e7ce9ce10eaa
MD5 (key.bin.enc) = 133b472475460beda561f1eb8b48b70c

I encrypted them according to that link, so follow that to decrypt.

The core dump came from CentOS 6.7 docker container running on this image:

Linux d815d2554a50 4.1.13-boot2docker #1 SMP Fri Nov 20 19:05:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Let me know if you need any more information,

Thanks

@OTP-Maintainer
Copy link
Author

sverker said:

Thank you, that worked well.

But I forgot to also ask for the matching beam executable file, to get correct symbol information.
The core says the executable filename is .../erts-7.2.1/bin/beam, which means this is a non-smp emulator.


@OTP-Maintainer
Copy link
Author

chrisyunker said:

That's been added. Encrypted the same as the other files.

Yes, this was run in a docker container with one core, so it compiled to use non-smp beam.

To be clear, when I run this same test suite on my multi-core mac, I've verified that it's running the beam.smp and is segfaulting in that same (or very similar) location.

So a non-smp core dump is probably good since it simplifies things.

Again, let me know if you need any more info. Thanks for the quick response.

Chris Yunker

@OTP-Maintainer
Copy link
Author

sverker said:

Here is a patch that I think will fix the problem:

{noformat}
diff --git a/erts/emulator/beam/beam_load.c b/erts/emulator/beam/beam_load.c
index b70e5b9..a6dce2d 100644
--- a/erts/emulator/beam/beam_load.c
+++ b/erts/emulator/beam/beam_load.c
@@ -6249,6 +6249,7 @@ erts_make_stub_module(Process* p, Eterm Mod, Eterm Beam, Eterm Info)
     code[MI_LITERALS_END] = 0;
     code[MI_LITERALS_OFF_HEAP] = 0;
     code[MI_ON_LOAD_FUNCTION_PTR] = 0;
+    code[MI_LINE_TABLE] = 0;
     code[MI_MD5_PTR] = 0;
     ci = MI_FUNCTIONS + n + 1;
{noformat}

@OTP-Maintainer
Copy link
Author

chrisyunker said:

Thanks. That looks like it might have fixed it. Since applying that patch, we have not seen a segfault.

@OTP-Maintainer
Copy link
Author

sverker said:

Good. I'll put this fix in the pipe for OTP-18.3.

The crash was caused by process_info(Pid, [current_location|...]) on a process executing in a hipe compiled module where the pointer to source line 
information was not initialized properly (to NULL).







@OTP-Maintainer OTP-Maintainer added bug Issue is reported as a bug team:VM Assigned to OTP team VM priority:medium labels Feb 10, 2021
@OTP-Maintainer OTP-Maintainer added this to the OTP-18.3 milestone Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug priority:medium team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

2 participants