Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic from Rust code when parsing possibly bad input #26

Closed
aclemmensen opened this issue Sep 23, 2017 · 9 comments
Closed

Panic from Rust code when parsing possibly bad input #26

aclemmensen opened this issue Sep 23, 2017 · 9 comments
Labels

Comments

@aclemmensen
Copy link

Embarrassingly I don't have a repro or much information on what happens. But I get this message on stderr right before the entire application is shut down:

thread '<unnamed>' panicked at 'expected parent found none', src/flat_dom.rs:162:28
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread '<unnamed>' panicked at 'expected parent found none', src/flat_dom.rs:162:28
thread '<unnamed>' panicked at 'expected parent found none', src/flat_dom.rs:162:28
erl_child_setup closed

This error happens in an application that parses thousands of pages from the same site as a streaming operation. It takes down the entire application when this happens. Ideally this parser issue would be returned as an error tuple or something to that effect so I could either discard them or log the issue somewhere.

Any pointers on how I may capture more information when this sort of crash happens would be greatly appreciated. I don't have much experience working with NIFs or Rust for that matter.

PS: I have since set the RUST_BACKTRACE variable and I will update this issue if I capture more output from crashed applications.

@mischov
Copy link
Owner

mischov commented Sep 23, 2017

Thank you for raising this issue.

If you could figure out some way to provide the HTML that causes the panic, that would be ideal and would allow me to get at the root cause. I do understand, though, that the panic is not making it easy to figure out where things are messing up. Would you be able to do something like log the url of each page right before you attempt to parse a page, and then check which url is last logged before the application dies?

I also appreciate you bringing the issue of panics not providing any good debug info to my attention because that is not very user-friendly at all.

@aclemmensen
Copy link
Author

I've got a few stack traces now but I'm not sure it helps much:

thread '<unnamed>' panicked at 'expected parent found none', src/flat_dom.rs:162:28
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:380
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:396
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:611
   5: std::panicking::begin_panic_new
   6: meeseeks_html5ever_nif::flat_dom::FlatDom::get_parent_and_index
   7: meeseeks_html5ever_nif::flat_dom::FlatDom::remove_from_parent
   8: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tree_builder::actions::TreeBuilderActions<Handle>>::adoption_agency
   9: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tree_builder::rules::TreeBuilderStep<Handle>>::step
  10: <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token
  11: <html5ever::tokenizer::Tokenizer<Sink>>::process_token
  12: <html5ever::tokenizer::Tokenizer<Sink>>::emit_current_tag
  13: <html5ever::tokenizer::Tokenizer<Sink>>::step
  14: <html5ever::tokenizer::Tokenizer<Sink>>::run
  15: tendril::stream::TendrilSink::one
  16: std::panicking::try::do_call
  17: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:98
  18: <F as scoped_pool::Task>::run
  19: scoped_pool::Pool::run_thread
erl_child_setup closed

Regarding the URLs, then yeah, I'll probably have to do that. The output volume is going to be pretty extreme, but I'll see if I can catch a repro that way.

@mischov
Copy link
Owner

mischov commented Sep 23, 2017

That might be enough info for me to fix the problem.

It's gonna be interesting to test without data to reproduce the error, though, so if you can figure out a way to get the HTML causing the problem it would be great.

@aclemmensen
Copy link
Author

I've just deployed with additional debugging. I'll update here when I find a smoking gun. :) Thanks for your attention to this, by the way.

@aclemmensen
Copy link
Author

aclemmensen commented Sep 23, 2017

I have a repro now, it's this URL: http://hcasouthatlantic.com/careers/search.dot?jobId=26573-137096&src=CWS-10230

Ironically that seems to be down right now. At least for me. But I've uploaded the HTML that causes issues here: https://pastebin.com/w6hkqLtB

I got a more detailed stack as well:

stack backtrace:
   0:     0x7f95a8b2bc63 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::hcdf51e4c9dc54357
                               at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x7f95a8b26474 - std::sys_common::backtrace::_print::h9da91fd31a37d0f1
                               at /checkout/src/libstd/sys_common/backtrace.rs:71
   2:     0x7f95a8b39283 - std::panicking::default_hook::{{closure}}::h46820a72bf0cb624
                               at /checkout/src/libstd/sys_common/backtrace.rs:60
                               at /checkout/src/libstd/panicking.rs:380
   3:     0x7f95a8b38ff2 - std::panicking::default_hook::h4c1ef1cc83189c8e
                               at /checkout/src/libstd/panicking.rs:396
   4:     0x7f95a8b39747 - std::panicking::rust_panic_with_hook::h99016f44bdcb8544
                               at /checkout/src/libstd/panicking.rs:611
   5:     0x7f95a8a8b6ba - std::panicking::begin_panic_new::h25861ade5cc6b69c
   6:     0x7f95a8ab1518 - meeseeks_html5ever_nif::flat_dom::FlatDom::get_parent_and_index::h5a4d6eef62f21820
   7:     0x7f95a8ab1554 - meeseeks_html5ever_nif::flat_dom::FlatDom::remove_from_parent::ha147b0640a7e74db
   8:     0x7f95a8a84a2f - <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tree_builder::actions::TreeBuilderActions<Handle>>::adoption_agency::hfeb0ae5fa2dfb324
   9:     0x7f95a8a7b3ba - <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tree_builder::rules::TreeBuilderStep<Handle>>::step::h83603b59d9da5300
  10:     0x7f95a8a6e48f - <html5ever::tree_builder::TreeBuilder<Handle, Sink> as html5ever::tokenizer::interface::TokenSink>::process_token::h25c681880ae9da9f
  11:     0x7f95a8a9208f - <html5ever::tokenizer::Tokenizer<Sink>>::process_token::hf8bdb3ee0e1402e3
  12:     0x7f95a8a92558 - <html5ever::tokenizer::Tokenizer<Sink>>::emit_current_tag::h7426da1a749e1bd6
  13:     0x7f95a8a9c47d - <html5ever::tokenizer::Tokenizer<Sink>>::step::hf7bc78493df7e993
  14:     0x7f95a8a93efd - <html5ever::tokenizer::Tokenizer<Sink>>::run::h506621f0042d5741
  15:     0x7f95a8aaabf4 - tendril::stream::TendrilSink::one::ha91cbaffc0171b74
  16:     0x7f95a8a8b8f8 - std::panicking::try::do_call::h1b7398db8c3eef36
  17:     0x7f95a8b42a4c - __rust_maybe_catch_panic
                               at /checkout/src/libpanic_unwind/lib.rs:98
  18:     0x7f95a8a89d02 - <F as scoped_pool::Task>::run::h1dbda6e2e909e033
  19:     0x7f95a8afdcba - scoped_pool::Pool::run_thread::h74dcdf6fcd98ec41
  20:     0x7f95a8a8a1b2 - std::sys_common::backtrace::__rust_begin_short_backtrace::h9a693f6c5721b138
  21:     0x7f95a8a8bbf2 - std::panicking::try::do_call::h6225a116e969d91c
  22:     0x7f95a8b42a4c - __rust_maybe_catch_panic
                               at /checkout/src/libpanic_unwind/lib.rs:98
  23:     0x7f95a8a91e7b - <F as alloc::boxed::FnBox<A>>::call_box::hc15863d71ca87171
  24:     0x7f95a8b381cb - std::sys::imp::thread::Thread::new::thread_start::h10fad04495d944f7
                               at /checkout/src/liballoc/boxed.rs:661
                               at /checkout/src/libstd/sys_common/thread.rs:21
                               at /checkout/src/libstd/sys/unix/thread.rs:84
  25:     0x7f9657e00181 - start_thread
  26:     0x7f9657924fbc - __clone
  27:                0x0 - <unknown>

@mischov
Copy link
Owner

mischov commented Sep 23, 2017

Okay, I have a solution (see the reference above), and I've both reproduced the error and confirmed that the proposed fix works. I'll see if I can figure out a good test to add to the library and get that fix out shortly.

@aclemmensen
Copy link
Author

That's perfect. Thank you so much. :)

@mischov
Copy link
Owner

mischov commented Sep 23, 2017

Problem should be fixed as of v0.7.5.

Thank you again for reporting and for providing the data I needed to test.

@mischov
Copy link
Owner

mischov commented Sep 23, 2017

I am closing this issue, but if your problem is not resolved please re-open.

@mischov mischov closed this as completed Sep 23, 2017
@mischov mischov added the A:Bug label Sep 23, 2017
@mischov mischov added A:Error and removed A:Bug labels Jun 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants