Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-fatal panic appears in paramcache logs #551

Closed
laser opened this issue Mar 13, 2019 · 13 comments
Closed

non-fatal panic appears in paramcache logs #551

laser opened this issue Mar 13, 2019 · 13 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@laser
Copy link
Contributor

laser commented Mar 13, 2019

Description

After the feat/bench-constraint-system branch was merged, paramcache started logging a panic to stdout in CircleCI. This panic doesn't cause paramcache to exit with a non-zero status code.

The first occurrence of this error can be found here.

A subset of the logs from the aforementioned build run which contain the panic:

Mar 13 07:41:10.281 INFO checking cache_path: "/root/.filecoin-parameter-cache/v10-vdf-post-c8ddc8e74f9c2ed6a4bc83df171a9e9e84e242cf6e48e20f71585106fa8a3ce6", id: vdf-post-c8ddc8e74f9c2ed6a4bc83df171a9e9e84e242cf6e48e20f71585106fa8a3ce6, target: params, place: storage-proofs/src/parameter_cache.rs:138 storage_proofs::parameter_cache, root: storage-proofs
Mar 13 07:41:10.281 INFO reading groth params from cache: "/root/.filecoin-parameter-cache/v10-vdf-post-c8ddc8e74f9c2ed6a4bc83df171a9e9e84e242cf6e48e20f71585106fa8a3ce6", target: params, place: storage-proofs/src/parameter_cache.rs:202 storage_proofs::parameter_cache, root: storage-proofs
Mar 13 07:41:12.983 INFO groth_parameter_bytes: 294817848, target: stats, place: storage-proofs/src/parameter_cache.rs:208 storage_proofs::parameter_cache, root: storage-proofs
Mar 13 07:41:12.983 INFO Getting verifying key., id: vdf-post-c8ddc8e74f9c2ed6a4bc83df171a9e9e84e242cf6e48e20f71585106fa8a3ce6.vk, target: verifying_key, place: storage-proofs/src/parameter_cache.rs:162 storage_proofs::parameter_cache, root: storage-proofs
Mar 13 07:41:thread '<unnamed>' panicked at 'cannot access stderr during shutdown', src/libcore/option.rs
@laser laser self-assigned this Mar 13, 2019
@laser laser added the bug Something isn't working label Mar 13, 2019
@laser laser added the good first issue Good for newcomers label Apr 18, 2019
vmx pushed a commit that referenced this issue Apr 26, 2019
The asynchronous logger was logging panics but didn't fail on them.

Fixes #551.
@vmx
Copy link
Contributor

vmx commented Apr 26, 2019

This isn't an issue anymore. I've introduced a random panic here

. I was able to see the panic in the logs and an echo $? returned 101.

Sadly I didn't check if that is still an issue early on. I found this branch https://github.com/filecoin-project/rust-fil-proofs/compare/bugs/551-panic-in-paramcache-logs and rebased it to master. Thanks @laser for the work, I really didn't do anything, just squashed the two commits and improved the code slightly. I've pushed it in case we ever think sync logging is better: https://github.com/filecoin-project/rust-fil-proofs/tree/sync-logger

@laser
Copy link
Contributor Author

laser commented Apr 26, 2019

This isn't an issue anymore.

@vmx - I still see the panic in recent CI builds of the rust-fil-proofs project (check out end of the Ensure cache is hydrated with PoRep and PoSt Groth parameters (for test) output). It's not a problem, I suppose, because it's not causing builds to fail - but it probably does imply that there's something configured incorrectly somewhere.

I've pushed it in case we ever think sync logging is better: https://github.com/filecoin-project/rust-fil-proofs/tree/sync-logger

I haven't really investigated the sync logging solution. My guess is that the sync logger synchronizes (surprise!) log writes requested by different threads. Is that true? If so, we'd need to be cautious when logging from spawned/child threads to ensure that we didn't create contention for the logger.

@vmx
Copy link
Contributor

vmx commented Apr 27, 2019

@laser: perhaps we can add a echo $? to that build step to see if it's a CI issue or if the return value is swallowed.

@laser
Copy link
Contributor Author

laser commented Apr 29, 2019

@vmx - Oddly enough, the panic doesn't impact the exit code of the paramcache command. Check out this CI build, running this commit.

@vmx
Copy link
Contributor

vmx commented Apr 29, 2019

Sounds like the panic is not where I've put it. What about running it on CI with RUST_BACKTRACE=1 so we can add a fake panic where it happened for proper reproduction.

@vmx
Copy link
Contributor

vmx commented May 10, 2019

Here's what I found out. If a thread panics, the main process won't panic. Instead you need to call join()
on the spawed thread and check the result. Here's a minimal example:

use std::thread;

fn main() {
    let result = thread::spawn(move || {
        panic!("You see this panic, but the main thread won't terminate");
    }).join();

    println!("result: {:?}", result);
}

Output:

$ ./threadpanic_minimal
thread '<unnamed>' panicked at 'You see this panic, but the main thread won't terminate', threadpanic_minimal.rs:5:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
result: Err(Any)

From looking through the code base, threads are always joined correctly. So the question is, where does this error actually happen. I have no clue how to find that out.

@laser
Copy link
Contributor Author

laser commented May 10, 2019 via email

@vmx
Copy link
Contributor

vmx commented May 10, 2019

@laser So you mean in the slog-async module and not our own code?

@vmx
Copy link
Contributor

vmx commented May 10, 2019

slog-rs/async#11 perhaps?

@laser
Copy link
Contributor Author

laser commented May 10, 2019

@vmx

@laser So you mean in the slog-async module and not our own code?

That's my current hypothesis, yep.

After doing some binging, I found this thread which addresses a similar-looking panic.

@vmx
Copy link
Contributor

vmx commented May 10, 2019

Use slog-async master still panics https://circleci.com/gh/filecoin-project/rust-fil-proofs/19612.

I'm not of much help here. I understand only little on what those issues talk about :)

@dignifiedquire
Copy link
Contributor

@laser are you actively working on this?

@laser
Copy link
Contributor Author

laser commented May 15, 2019

@dignifiedquire - No, I am not. @vmx has done some investigation into the cause of this behavior and I've been supporting those efforts, albeit poorly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants