Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with html5ever when using it in golang with a signal handler #393

Closed
joelwurtz opened this issue Sep 27, 2019 · 6 comments · Fixed by servo/string-cache#227

Comments

@joelwurtz
Copy link
Contributor

commented Sep 27, 2019

This is really a weird bug that occurs on one of our program, i managed to make a reproduce small example under this repository: https://github.com/joelwurtz/segfault-golang-with-rust

What it does:

  • We are parsing a html fragment which contains a not existing local name attribute under html5ever
  • We exposing this parsing under a static library (on the musl target)
  • We compile this static library on go with musl gcc
  • We create a signal channel before calling the exposed api
  • We call this function after which cause the segfault (which happens on the LocalName::from(&*attribute_name) call with a not defined local name

This result in a segfault 100% of the time

There a some workaround for this:

  • Adding the attribute into the local_names.txt file will fix the problem
  • Calling the exposed API before creating the signal handler will fix the problem

I really don't know what happens inside and it's really difficult to trace it due to the way of compiling this.

I'm pretty sure this is a race case on rust lang and will do also an issue there.

@SimonSapin

This comment has been minimized.

Copy link
Member

commented Sep 27, 2019

Are you able to get a more precise stack trace at the time of segfault? Perhaps with gdb? Are there multiple kernel threads in this process? Does more than one of them call into Rust code? Can they run at the same time, or does your reduced program only do one call into Rust? Is Rust code called from a signal handler? (I’m not familiar with Go but you mentioning a "signal channel" suggest that the handler only sends a message.)

Is the unknown attribute name more than 7 bytes? If so LocalName::from will eventually add an entry in a global hash map which is wrapped in std::sync::Mutex (for thread-safe mutability) and std::sync::Once (via lazy_static, for thread-safe initialization). Internally on Unix platforms, those are based on libc API like pthread_mutex_lock and pthread_cond_signal.

That said, I still have no idea how this could be affected so badly by settings up an unrelated signal handler.

@joelwurtz

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

Are you able to get a more precise stack trace at the time of segfault?

Unfortunetaly no, the only stack trace that i got is this one:

(gdb) run
Starting program: /segfault-html5ever-golang/main 
[New LWP 857023]
[New LWP 857024]
[New LWP 857025]
[New LWP 857026]
[New LWP 857027]
[New LWP 857028]
It works with html

Thread 5 "main" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 857026]
0x00000000004b2a13 in __rust_probestack () at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.16/src/probestack.rs:55
55	/cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.16/src/probestack.rs: No such file or directory.
(gdb) bt
#0  0x00000000004b2a13 in __rust_probestack () at /cargo/registry/src/github.com-1ecc6299db9ec823/compiler_builtins-0.1.16/src/probestack.rs:55
#1  0x0000000000000000 in ?? ()
(gdb) 

I believe using compiler_builtins on 0.1.19 would give me a better stack trace (thanks to rust-lang/compiler-builtins@985a430), but don't really know how to build the whole toolchain to have it

Are there multiple kernel threads in this process?

I believe there are since i got this from gdb (but not an expert on this subject)

[New LWP 857023]
[New LWP 857024]
[New LWP 857025]
[New LWP 857026]
[New LWP 857027]
[New LWP 857028]

Does more than one of them call into Rust code?

On our original program yes there are (since we are doing this on a web server that spawn threads),

Can they run at the same time, or does your reduced program only do one call into Rust?

Yes they can, but i made an example that is only doing one call so i don't think this is related

Is Rust code called from a signal handler?

No it's not, in fact in the example i don't have an handler, i just created the signal, but maybe creating a signal will put following code into a different context (not sure either)

Is the unknown attribute name more than 7 bytes? ...

Yes it is, and confirming that having an unknown attribute name under 8 bytes does not create a segfault

That said, I still have no idea how this could be affected so badly by settings up an unrelated signal handler.

Maybe not related to signal handler but everything that switch current thread (but not an expert on how golang does that :/) and thread-safe initialization is not done on the main thread so it segfault

@SimonSapin

This comment has been minimized.

Copy link
Member

commented Sep 27, 2019

#1  0x0000000000000000 in ?? ()

This looks like the Rust code in your program doesn’t have debug info. Do you call Cargo with --release? In gdb more useful if you remove that?

Removing --release will also disable compiler optimizations, which could potentially make the bug disappear. If that happens, see https://doc.rust-lang.org/cargo/reference/manifest.html#the-profile-sections for configuring Cargo with both optimizations and debug info.

confirming that having an unknown attribute name under 8 bytes does not create a segfault

Short strings are kept inline by string-cache / LocalName, so this does point to the bug being related to string-cache’s global hash map.

Calling the exposed API before creating the signal handler will fix the problem

… and this suggests a problem related to initialization with lazy_static!

thread-safe initialization is not done on the main thread so it segfault

The goal of lazy_static! is that initialization can be safely done on any thread. And further, multiple threads trying to do it at the same time should be properly synchronized (through std::sync::Once).

@joelwurtz

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2019

This looks like the Rust code in your program doesn’t have debug info. Do you call Cargo with --release?

No i'm using the debug version see https://github.com/joelwurtz/segfault-golang-with-rust/blob/master/main.go#L4 which got the bug

SimonSapin added a commit to servo/string-cache that referenced this issue Sep 30, 2019
SimonSapin added a commit to servo/string-cache that referenced this issue Sep 30, 2019
@SimonSapin

This comment has been minimized.

Copy link
Member

commented Sep 30, 2019

Thanks to nagisa for investigating in rust-lang/rust#64834. I can reproduce a segfault when creating a dynamic atom with a small stack:

fn main() {
    std::thread::Builder::new().stack_size(50_000).spawn(|| {
        string_cache::DefaultAtom::from("12345678");
    }).unwrap().join().unwrap()
}
     Running target/debug/deps/small_stack-128211424cdfd6be

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
error: process didn't exit successfully: `/home/simon/projects/servo-deps/string-cache/target/debug/deps/small_stack-128211424cdfd6be small` (signal: 6, SIGABRT: process abort signal)

servo/string-cache#227 fixes it for me. Could you try it? See https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#overriding-dependencies about overriding the string_cache crate in your program with a local copy.

bors-servo added a commit to servo/string-cache that referenced this issue Sep 30, 2019
Fix initializing the global hash map with a small stack

Fix servo/html5ever#393

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/string-cache/227)
<!-- Reviewable:end -->
@joelwurtz

This comment has been minimized.

Copy link
Contributor Author

commented Sep 30, 2019

Yes it works, thanks a lot for fixing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.