Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

liblxc->start does not close fds in intermediate process #354

Closed
pb-- opened this issue Nov 7, 2014 · 12 comments
Closed

liblxc->start does not close fds in intermediate process #354

pb-- opened this issue Nov 7, 2014 · 12 comments
Assignees
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter

Comments

@pb--
Copy link

pb-- commented Nov 7, 2014

I am trying to use liblxc in my project. Previously I was using
libvirt and everything was ok, but after switching to lxc I found
one issue.
After invoking lxc->start ps shows sth like this:

_ (1) my-process
_ _ (2) my-process (forked)
_ _ (3) container-init

The problem is with process (2). It appears that this process holds
all the file descriptor copied from process (1). In my case it's a
real problem because one of this file descriptors is a dbus socket
and after process (1) death, process (2) still holds process (1)
dbus name.

@hallyn hallyn added the Bug Confirmed to be a bug label Nov 7, 2014
@hallyn hallyn self-assigned this Nov 7, 2014
@hallyn
Copy link
Member

hallyn commented Nov 11, 2014

Can you show in detail (ls -l /proc/pid/fd of both processes) which fds you think are not being removed?

The lxc_check_inherited() call happens at the top of lxc_start(), so that both the lxc monitor (2) and the lxc init (3) should have all fds closed. In fact, when I try to test this on my laptop, all fds are closed.

So I'd be curious to see what fds are being closed and how we're doing it wrong in lxc_check_inherited().

@hallyn hallyn added the Incomplete Waiting on more information from reporter label Nov 11, 2014
@pb--
Copy link
Author

pb-- commented Nov 12, 2014

This issue is not about lxc monitor, see my ps axf output:
http://codepaste.net/oz5dn2

@pb--
Copy link
Author

pb-- commented Nov 12, 2014

OK, I've solved the problem, the flag 'close_all_fds' was not set.
It was my fault I forgot to call want_daemonize(true), but still there is a bug in lxc:

lxccontainer.c line 469, there is a comment: "daemonize implies close_all_fds so set it"
So why the default lxc state is daemonize=1, close_all_fds=0?

@hallyn
Copy link
Member

hallyn commented Dec 19, 2014

lxccontainer.c line 469, there is a comment: "daemonize implies close_all_fds so set it"
So why the default lxc state is daemonize=1, close_all_fds=0?

Looks like an oversight. The default wasn't daemonize=1 until recently,
so the commit changing that default overlooked it.

I think the line setting c->daemonize = true in lxc_container_new should
perhaps call lxcapi-want_daemonize(c, true) instead.

@hallyn
Copy link
Member

hallyn commented Dec 19, 2014

Looks like an oversight. The default wasn't daemonize=1 until recently,
so the commit changing that default overlooked it.

I think the line setting c->daemonize = true in lxc_container_new should
perhaps call lxcapi-want_daemonize(c, true) instead.

Note, a patch to this effect should be accepted if you send it.

@tripledes
Copy link
Contributor

Hi,

any news on this issue? I think I just hit it from both Ruby and Python bindings.

When I start a recently created container, there's always a process hanging left over.

The code I'm using to test is:

https://gist.github.com/tripledes/330533a9589aed1a88d4

With that simple code, the results are the container is created and started, but there's a process left behind.

root@lxc01:~/liblxc# python3 ../test.py 
Checking cache download in /var/cache/lxc/trusty/rootfs-amd64 ... 
Copy /var/cache/lxc/trusty/rootfs-amd64 to /var/lib/lxc/ubuntu_test/rootfs ... 
Copying rootfs to /var/lib/lxc/ubuntu_test/rootfs ...
Generating locales...
  en_US.UTF-8... up-to-date
Generation complete.
Creating SSH2 RSA key; this may take some time ...
Creating SSH2 DSA key; this may take some time ...
Creating SSH2 ECDSA key; this may take some time ...
Creating SSH2 ED25519 key; this may take some time ...
update-rc.d: warning: default stop runlevel arguments (0 1 6) do not match ssh Default-Stop values (none)
invoke-rc.d: policy-rc.d denied execution of start.

Current default time zone: 'America/Los_Angeles'
Local time is now:      Fri Jan  2 08:22:53 PST 2015.
Universal Time is now:  Fri Jan  2 16:22:53 UTC 2015.


##
# The default user is 'ubuntu' with password 'ubuntu'!
# Use the 'sudo' command to run tasks as root in the container.
##

root@lxc01:~/liblxc# ps axuww | grep python
root      3521  0.0  0.5  62840  5108 ?        Ss   08:22   0:00 python3 ../test.py
root@lxc01:~/liblxc# 

Stracing the whole thing with "-ff" and writing the output to files shows the process calling epoll_create(2), it returns fd 16 (I haven't seen any other syscall returning 16 as fd in the output), throughout the file, there are several calls to epoll_wait with same fd as first parameter and finally the last line of the file shows:

 epoll_wait(16, 

So, I'm I missing anything that is causing this behavior ? is it related to this issue or should I open a new one?

I've been able to reproduce it with both Ruby and Python, so I presume is liblxc related but I could be wrong.

Thanks.

UPDATE: Using the following C code: https://gist.github.com/tripledes/7f49cee18a8292e4652e , which is a slightly modified version of the example shown at: https://www.stgraber.org/2014/02/05/lxc-1-0-scripting-with-the-api/. I'm able to reproduce the issue, the command is still shown in the process table but the prompt has returned and stracing to the process shows same epoll_wait() call.

@hallyn
Copy link
Member

hallyn commented Jan 5, 2015

Quoting Sergio Jimenez (notifications@github.com):

Hi,

any news on this issue? I think I just hit it from both Ruby and Python bindings.

No, would you like to send a patch?

@tripledes
Copy link
Contributor

@hallyn I can try, I cannot promise and I might need some help :-)

I'll be back from holidays in few days then I'll try to find out where the issue is.

@lassi-niemisto
Copy link

I experience a possibly-related problem on lxc 1.0.6.

I have a service process (daemon) which is using lxc via system() calls

Previously, on 0.7.5 (iirc) the following did not make container processes inherit parent fd's:
system("lxc-start -d -n containerName")

On lxc 1.0.6 instead the same command gets the fd's inherited. Adding -C fixes the situation, even though manpage tells that -d should imply -C.

Could the problem be related to the fact that calling process is already a daemon?

@hallyn
Copy link
Member

hallyn commented Jan 9, 2015

@tripledes
Actually the patch isn't quite as simple as I'd thought so I've posted a fix to the mailing list.

@tripledes
Copy link
Contributor

@hallyn thanks for letting us to know, do you have a URL to the thread on the mailing list?

@hallyn
Copy link
Member

hallyn commented Jan 10, 2015

@hallyn hallyn closed this as completed Jan 10, 2015
hallyn added a commit that referenced this issue Jan 12, 2015
When containers request to be daemonized, close-all-fd is
set to true.  But when we switched ot daemonize-by-default we didn't
set close-all-fd by default.

Fix that.  In order to do that we have to always have a lxc_conf
object.  As a consequence, after this patch we can drop a bunch
of checks for c->lxc_conf existing.  We should consider removing
those.  This patch does not do that.

This should close #354

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
z-image pushed a commit to z-image/lxc that referenced this issue Oct 16, 2016
When containers request to be daemonized, close-all-fd is
set to true.  But when we switched ot daemonize-by-default we didn't
set close-all-fd by default.

Fix that.  In order to do that we have to always have a lxc_conf
object.  As a consequence, after this patch we can drop a bunch
of checks for c->lxc_conf existing.  We should consider removing
those.  This patch does not do that.

This should close lxc#354

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Stéphane Graber <stgraber@ubuntu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Incomplete Waiting on more information from reporter
Development

No branches or pull requests

4 participants