New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: Make clear instructions for getting a core file, when container crashes #11740
Comments
+kind/writing |
...and another super-important-detail that everybody should be made aware of regarding core dumps*
Well hey! Guess what? The Like this: #!/bin/sh
ulimit -c unlimited So again, no core dumps for you my friend! No core file will be generated. And when it fails to set the value, ^^ That is far too easy to get wrong and IMHO needs to be documented also. |
More problems: When I set the core_pattern on host echo '/config/core.%h.%e.%t' > /proc/sys/kernel/core_pattern
echo '/recordings/core.%h.%e.%t' > /proc/sys/kernel/core_pattern ... no core file is created. Yet if I set the path to be echo '/tmp/core.%h.%e.%t' > /proc/sys/kernel/core_pattern I don't understand. Other problem: Setting shebang to [EDIT] I've since been told that setting the ulimit is only applied for the context (scope) and duration of the script. So the setting is not seen from other launched processes inside the container (e.g. The core file is created. Into Please someone else check these things out. Confirm / deny. |
Just tried adding this to my dockerfile: FROM ubuntu-debootstrap:14.04
RUN echo '* soft core 99999999' >> /etc/security/limits.conf
RUN echo '* hard core 99999999' >> /etc/security/limits.conf However it did not appear to change the default value of ulimit. Still Anybody more experienced (knowledgable) know if there a chance to set the |
@dreamcat4 ulimit settings are inherited from docker engine, currently. |
@cpuguy83 Great. That sounds a lot better :) FYI, I am still searching for an answer why it only writes the core files to More reference: |
@dreamcat4 Thank you for this contribution. It is very helpful when users take the time to add these kind of issues. We will fix this. I'll ping you back so you can review. |
FYI: Have been working on an image with debugging start script that does most of the necessary stuff. Still figuring out the folder permissions thing. (Which I suppose is the last piece of it). Image not ready yet. But until then here is a gist of the start script (the https://gist.github.com/dreamcat4/c2bea0e889de8860b035 AFAIKT you can't really get away without a special entrypoint script too, such as like the one in the gist above ^^. Because certain other things, in addition to the container's In other words: "ulimit -c unlimited is not enough" ... which I am guessing may be worth mentioning in the new documentation of ulimit feature (issue #11754) ?? To help people out a bit with understanding what else they must also be doing to get their core dumps. Here is a summary of those steps:
Or else I think maybe my It cannot put back (restore) the |
OK. Gist updated with final fixes / improvements. https://gist.github.com/dreamcat4/c2bea0e889de8860b035 Didn't ever get to the bottom of the suspected permissions issue. It just 'went away all by itself' after moving my core dumps folder to be Tested several times with permissions 772, 775, 755, and as either same user or same group owning the core dumps folder (when g+w). Got a core dump each time. So whatever was wrong before I'm inclined to just ignore it and move on. @moxiegirl This gist script aught to show most persons well enough, how can take a core dump inside of their docker containers. It was written for a specific target program, but can may adapted for general case. If you wish to action further documentation about core dumps, please do so as you wish, and continue on the other ticket where thie new flag is being discussed (I didn't end up needing the new flag BTW). You may close my issue here and take anything of value over to other ticket. Ticket: #11754 BTW: Another FYI (this time, for
|
Oh wait! There is a remaining (not documented) problem regarding core dumps and docker. Which is now identified the 'problem inexplicably went away' during testing. When I moved over to using a host mounted volume. As is shown in the YAML config file in the gist. Where the volume HOWEVER if the same folder is a regular docker volume e.g. the declaration Since I am not at all sure about the underlying the technical difference(s) of the 2 types of docker volumes, the perhaps someone else with more relevant expertise in that area can please comment. So that we can try to get to the bottom of this problem. As was the original source of frustration as per the initial comments at top of page. There is of course also a 3rd kind of target folder for core dumps, which is not any volume (and therefore is not persistent after container stop). I have not actually tested that situation. Due to a my desire to keep that core dumps after container exits. That situation is slightly less useful for users, unless they copy the core file or send it (over network etc) once the core dump is completed, and just before the container exits. But still may be a valid one. Since then no need to declare an extra volume, and the destination for core files can be configured dynamically, log - rotated, etc.. So yet another shout out to Ticket #11754 - it aught to be worth it, finding out about this remaining problem, if only to properly document it, and give a clear instruction to users. |
@dreamcat4, this is fantastic debugging. Docker team, please take note! Getting core dumps in Docker is nigh impossible without being a wizard! |
Wow, thank you @dreamcat4! We are just coming off the 1.6 release, so I should be able to pick this up in the coming weeks. (Other contributors are, of course, welcome to take it on sooner.) I'll be sure to ping you back for the PR review. |
FWIW, I'm having decent luck using So far the caveats have been:
This is without |
Great! Thanks for commenting here. |
Great to see that I'm not the only one who cares about postmortem analysis of failed Docker containers! For whatever it's worth, I'm going to mention this issue during my talk this afternoon at DockerCon 2015; hopefully that will help get some more attention on this issue, even if it only means clearer documentation! |
@bcantrill ok Brian. A working example where you can see it in action / take thing from is my https://registry.hub.docker.com/u/dreamcat4/tvh.debug/dockerfile/ Script: debug.sh https://github.com/dreamcat4/docker-images/blob/master/tvh/debug/stable/debug.sh Usage: https://github.com/dreamcat4/docker-images/blob/master/tvh/README.md#debugging-tvheadend Caveat:
|
Just thought I'd share one more discovery on this front. In order to test process crashes I tried using Killing the main PID Of the container with SIGABRT seems to be ignored both from the host or inside a My work around is to run the process under |
Great writeup @dreamcat4. Hitting the same problems today, glad to see it is well documented. |
More info on For my system: $ uname -a
Linux *** 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux My core_pattern file was a python script: $ cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c Here is the apport python script source: It reports it's logs to This explains why no core file was created. I also noticed what dreamcat4 states above: if TLDR;
|
@tjmehta could you please clarify what do you mean by "Prepend your container command w/ ulimit -c unlimited"? Do you suppose creating special entrypoint in the container image? |
I meant just using && : |
What is the end result of this discussion? Has the questions asked by @dreamcat4 answered yet? If yes, can someone please point me to that? Thanks! |
I would like to point out that when I run command: docker run -i -t --privileged=true test:base /bin/bash |
@sdivekar I'd recommending setting the core_pattern at the docker host level. The pattern is a kernel-level parameter so I suspect if multiple containers try to set it to different things you'll end up with a last-write-wins case where some containers won't be able to dump a core file. Not sure if you saw my comment here but setting the pattern at the host level also avoids the need for running in privileged mode. |
Hi, I opened an issue #19289 about core dump in docker. Do you have any demands or suggestions about this? |
@dreamcat4 @matschaffer I have tried setting /proc/SYS/DS/suid_dumpable is set to 1 as well |
Meant /proc/sys/fs/suid_dumpable and not /proc/SYS/DS/suid_dumpable i |
It appears that the application was using db, and the db client (oracle client)'s signal handling was preventing the generation of core dumps. Disabling that by adding DIAG_SIGHANDLER_ENABLED=FALSE to sqlnet.ora helped in generating the core. |
I would contribute my findings, which certainly overlap with the above said, but might contain complementary bits:
|
@moxiegirl the above unassignment is definitely a bug, how would I have ever rights to do that? |
And I suppose if not wanting to use the full-blown security-opt:
- apparmor:unconfined
- seccomp:unconfined plus certain |
@dreamcat4 you probably need |
…ps. See moby/moby#11740 TLDR /proc/sys/kernel is controlled by docker host
I'd suggest that all of this works around the problem. Reducing the confinement of the application, sometimes significantly in order to manage the core isn't the solution that I see people wanting. Since Docker is orchestrating the container, the preference would be for Docker to configure the location and methods for the kernel to deliver the core file. Perhaps this could be inside of the container, or in a related location (outside of the container root, but still managed space such as a volume), or in the system wide location. |
Here is my docker VM
I got:
Looks like --privileged=true allows me to bypass the error. |
Just dropping my two cents. Running a swarm on Ubuntu 14.04 (default container permissions), with each container using Had to adjust the following on the host to get them:
After that I was able to debug the issue from within the container using |
Docker is a miserable environment for generating core files (e.g. with `gcore`), for accessing the memory of another process (`/proc/$pid/mem`) or for other stuff involving ptrace. Thus, the pargs cases that depend on these features are skipped, when running inside docker Some Docker versions might work when some docker privileges are elevated, but Travis' Docker doesn't seem to offer much in that regard. See also: - moby/moby#11740 - moby/moby#7276 - travis-ci/travis-ci#5558
|
@sidazhou even if you could run a build in privileged mode, |
By using |
I haven't seen this mentioned in this thread, but the truth is a bit more complicated. If However, if The best documentation I've found for this oddity so far is this email thread: https://lkml.org/lkml/2015/10/24/134 |
Hello Guys. I am kind of stuck on this for some time now. I have followed all the suggestions mentioned over here. Here is my situation.
Any ideas how to solve this problem? |
Hi, The For instance, the following will create the core file in nsenter -t $1 -m tee /core >/dev/null 2>>/tmp/core.$$ However, this only works if the namespaces still exists at dump time. If the crashed process causes the whole container to exist, then the namespaces won't exist anymore, unfortunately. From what I understand, this is because Linux correctly waits for the core handler to run, but if the whole namespaces gets destroyed for another reason (e.g. PID 1 in the container exits), then you're out of luck. I hope this helps! |
@jpetazzo Thanks for your excellent tip. Just one quick question, what's %P vs %p , the documentation on man core not really helping.
|
@praveenmak When running processes in containers (or, generally speaking, with PID namespaces), processes will have one PID per namespace that they belong to; and namespaces are nested. If that feels confusing, you can try the following commands, on a Linux machine: CONTAINERID=$(docker run -d nginx)
docker exec $CONTAINERID ps faux
ps faux | grep nginx You will see that these NGINX processes have a PID inside the container, and a PID outside the container. So that corresponds to %p and %P respectively. I hope that helps! |
Core file is generated , but with "0" bytes. Any idea what could be wrong on my side? |
Hello,
I've been struggling with this on ubuntu 14.10, docker 1.5.0. More than should be the case.
My problems came from the following:
$ ulimit unlimited
is not the same thing as
$ ulimit -c 0
In the container, base image was
ubuntu-debootstrap:14.04
.apport
program. This is no good in containers. It must be overridden to something else ifapport
is not installed inside the container.It would also be a help if linux kernel could have a different value of
/proc/sys/kern/pattern
inside the container than outside it. Since in the 2 environments they may need to point to different locations. I'm not sure that is something that Docker can do anything about but might be worth to think about.Many thanks for any consideration. Else it will have to be answered on stack overflow.
The text was updated successfully, but these errors were encountered: