Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

US1886 Collect debug information from container when zrepl crashes #67

Merged
merged 1 commit into from Jun 15, 2018

Conversation

jkryl
Copy link

@jkryl jkryl commented Jun 14, 2018

Core dump support is more complicated and will not work in k8s env. Though it has been verified to work in local test env using docker image from zfs repo. We face two problems here:

  1. Linux core_pattern is global and cannot be set independently for a cgroup. Very unlikely that this would get fixed, hence ...
  2. people are investigating capability of piping core dumps to external program which can then do something with it based on k8s configuration: proposal of coredump detector kubernetes/community#1311 . This ticket seems stuck.

Core dump problem can be solved when either of the two problems mentioned above is solved. For now we have to be happy with just a stack from crashed zrepl.

Copy link

@gila gila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mynktl mynktl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jkryl
Thanks for changes. I have a few doubts about entrypoint.sh changes. Please have a look.

echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern
ulimit -c unlimited

exec /usr/local/bin/zrepl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are executing zrepl through exec. So entrypoint.sh process image will be replaced by zrepl process image and thus call_exit will never get executed on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code was taken from cstor repo. if I understand it right, the purpose of this trap is to print the line number in shell script if we exit prematurely before we get to the last line where zrepl is executed. The script can exit at any line because of the set -o errexit option. In that case it prints the line number to give a clue where the entrypoint script has failed.

however when I tested an error path in the script, all it prints is:

entrypoint.sh: 14: entrypoint.sh: cannot create /proc/sys/kernel/core_pattern: Permission denied
at call_exit..
exit code: 0
reference:  entrypoint.sh

I don't see any value in last three lines which are generated by the trap function. I will remove it. thanks.

entrypoint.sh Outdated
call_exit()
{
echo "at call_exit.."
echo "exit code:" $?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that return value should be provided through trap.
trap 'call_exit $? $LINE_NO' EXIT.

entrypoint.sh Outdated
#!/bin/sh

set -o errexit
trap 'call_exit $LINE_NO' EXIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it take $LINE_NO value?

@jkryl jkryl force-pushed the stacktrace branch 2 times, most recently from 7ae8ef4 to 8f63091 Compare June 15, 2018 08:25
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
@jkryl jkryl merged commit 2191a65 into zfs-0.7-release Jun 15, 2018
@jkryl jkryl deleted the stacktrace branch June 15, 2018 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants