Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to troubleshoot storm worker crash #466

Open
z11373 opened this issue Aug 28, 2019 · 1 comment
Open

how to troubleshoot storm worker crash #466

z11373 opened this issue Aug 28, 2019 · 1 comment

Comments

@z11373
Copy link

z11373 commented Aug 28, 2019

Hi, sorry to write something here, but I wonder if anybody has suggestion for me on how to troubleshoot and figure out the culprit of the worker crash problem we have right now. We are using streamparse for our Python code on Storm 1.1.1
Below is the log that I caught before it got recycled due to crash. I am running out ideas on how to troubleshoot it, I really appreciate if anyone has idea or pointer. Thanks!

2019-08-28 15:05:32.947 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Launched subprocess with pid 10054
2019-08-28 15:05:32.951 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Opened spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [INFO] Activating spout event_spout:(10)
2019-08-28 15:05:32.953 o.a.s.s.ShellSpout Thread-11-event_spout-executor[10 10] [INFO] Start checking heartbeat...
2019-08-28 15:05:32.961 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Async loop died!
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:

    at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
    ... 6 more

2019-08-28 15:05:32.968 o.a.s.d.executor Thread-11-event_spout-executor[10 10] [ERROR]
java.lang.RuntimeException: pid:10054, name:event_spout exitCode:-1, errorString:
at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:218) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.spout.ShellSpout.sendSyncCommand(ShellSpout.java:145) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.spout.ShellSpout.activate(ShellSpout.java:266) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.executor$fn__4962$fn__4977$fn__5008.invoke(executor.clj:641) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:484) [storm-core-1.1.1.jar:1.1.1]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:

    at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.1.jar:1.1.1]
    at org.apache.storm.spout.ShellSpout.querySubprocess(ShellSpout.java:183) ~[storm-core-1.1.1.jar:1.1.1]
    ... 6 more

2019-08-28 15:05:33.009 o.a.s.util Thread-11-event_spout-executor[10 10] [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) [storm-core-1.1.1.jar:1.1.1]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
at org.apache.storm.daemon.worker$fn__5632$fn__5633.invoke(worker.clj:763) [storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.executor$mk_executor_data$fn__4848$fn__4849.invoke(executor.clj:276) [storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.util$async_loop$fn__557.invoke(util.clj:494) [storm-core-1.1.1.jar:1.1.1]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2019-08-28 15:05:33.018 o.a.s.d.worker Thread-16 [INFO] Shutting down worker tmon-4-1567019114 ba5b3695-b390-4c3e-9d92-af0771f17b86 6700

@z11373
Copy link
Author

z11373 commented Aug 30, 2019

I tracked down the issue, it is actually segfault and causing the storm worker died. Here is what I found from /var/log/messages:

[165495.820435] streamparse_run[9133]: segfault at 0 ip (null) sp 00007fff94220478 error 14 in bbpy2.7[400000+11ff000]

Still, I need help in troubleshooting this issue, so any help is really appreciated. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant