New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
US1886 Collect debug information from container when zrepl crashes #67
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jkryl
Thanks for changes. I have a few doubts about entrypoint.sh
changes. Please have a look.
echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern | ||
ulimit -c unlimited | ||
|
||
exec /usr/local/bin/zrepl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we are executing zrepl
through exec. So entrypoint.sh
process image will be replaced by zrepl
process image and thus call_exit
will never get executed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code was taken from cstor repo. if I understand it right, the purpose of this trap is to print the line number in shell script if we exit prematurely before we get to the last line where zrepl is executed. The script can exit at any line because of the set -o errexit
option. In that case it prints the line number to give a clue where the entrypoint script has failed.
however when I tested an error path in the script, all it prints is:
entrypoint.sh: 14: entrypoint.sh: cannot create /proc/sys/kernel/core_pattern: Permission denied
at call_exit..
exit code: 0
reference: entrypoint.sh
I don't see any value in last three lines which are generated by the trap function. I will remove it. thanks.
entrypoint.sh
Outdated
call_exit() | ||
{ | ||
echo "at call_exit.." | ||
echo "exit code:" $? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that return value should be provided through trap
.
trap 'call_exit $? $LINE_NO' EXIT
.
entrypoint.sh
Outdated
#!/bin/sh | ||
|
||
set -o errexit | ||
trap 'call_exit $LINE_NO' EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it take $LINE_NO
value?
7ae8ef4
to
8f63091
Compare
Signed-off-by: Jan Kryl <jan.kryl@cloudbyte.com>
Core dump support is more complicated and will not work in k8s env. Though it has been verified to work in local test env using docker image from zfs repo. We face two problems here:
Core dump problem can be solved when either of the two problems mentioned above is solved. For now we have to be happy with just a stack from crashed zrepl.