Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iptables-save does not work inside lxd containers #1978

Closed
pgassmann opened this issue May 1, 2016 · 19 comments

Comments

@pgassmann
Copy link

commented May 1, 2016

Issue description

iptables-save command returns no output while iptables -nL works as expected

iptables-save is required to manage firewall with Puppet inside the containers, as puppet relies on its output for deciding with which order it should add a rule. Currently, the last rule drop all gets added at position 0 which will instantly block all traffic to that container.

Steps to reproduce

  1. create a new container lxc launch ubuntu:xenial iptables-test
  2. Enter container: lxc exec iptables-test -- bash
  3. Add iptable rule iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
  4. List rules: iptables -nL
  5. execute iptables-save
  6. No output is returned

Required information

  • Distribution: Ubuntu 16.04 Xenial
  • The output of "lxc info" or if that fails:
    apicompat: 0
    auth: trusted
    environment:
    addresses: []
    architectures:
    • x86_64
    • i686
      driver: lxc
      driverversion: 2.0.0
      kernel: Linux
      kernelarchitecture: x86_64
      kernelversion: 4.4.0-21-generic
      server: lxd
      serverpid: 9073
      serverversion: 2.0.0
      storage: btrfs
      storageversion: "4.4"
    • Storage backend in use: Btrfs
@stgraber

This comment has been minimized.

Copy link
Member

commented May 1, 2016

This is most likely because some table that iptables-save is trying to dump isn't accessible as the matching kernel module isn't loaded.

Containers cannot cause automatic kernel module loading, so it just fails.

You may want to run it under strace to see exactly what's going on.

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 1, 2016

it has no permission to open /proc/net/ip_tables_names

open("/proc/net/ip_tables_names", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

root@iptables-test:~# strace iptables-save
execve("/sbin/iptables-save", ["iptables-save"], [/* 11 vars */]) = 0
brk(NULL)                               = 0x1e65000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8b2320e000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=20483, ...}) = 0
mmap(NULL, 20483, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8b23208000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libip4tc.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\26\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=27424, ...}) = 0
mmap(NULL, 2122496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8b22de4000
mprotect(0x7f8b22dea000, 2093056, PROT_NONE) = 0
mmap(0x7f8b22fe9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x7f8b22fe9000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libip6tc.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\27\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=27456, ...}) = 0
mmap(NULL, 2122528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8b22bdd000
mprotect(0x7f8b22be3000, 2093056, PROT_NONE) = 0
mmap(0x7f8b22de2000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x7f8b22de2000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libxtables.so.11", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\200/\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=51872, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8b23207000
mmap(NULL, 2148792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8b229d0000
mprotect(0x7f8b229db000, 2097152, PROT_NONE) = 0
mmap(0x7f8b22bdb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb000) = 0x7f8b22bdb000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\t\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1864888, ...}) = 0
mmap(NULL, 3967488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8b22607000
mprotect(0x7f8b227c7000, 2093056, PROT_NONE) = 0
mmap(0x7f8b229c6000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f8b229c6000
mmap(0x7f8b229cc000, 14848, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8b229cc000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\r\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14608, ...}) = 0
mmap(NULL, 2109680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8b22403000
mprotect(0x7f8b22406000, 2093056, PROT_NONE) = 0
mmap(0x7f8b22605000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f8b22605000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8b23206000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8b23205000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8b23204000
arch_prctl(ARCH_SET_FS, 0x7f8b23205700) = 0
mprotect(0x7f8b229c6000, 16384, PROT_READ) = 0
mprotect(0x7f8b22605000, 4096, PROT_READ) = 0
mprotect(0x7f8b22bdb000, 4096, PROT_READ) = 0
mprotect(0x7f8b22de2000, 4096, PROT_READ) = 0
mprotect(0x7f8b22fe9000, 4096, PROT_READ) = 0
mprotect(0x613000, 4096, PROT_READ)     = 0
mprotect(0x7f8b23210000, 4096, PROT_READ) = 0
munmap(0x7f8b23208000, 20483)           = 0
brk(NULL)                               = 0x1e65000
brk(0x1e86000)                          = 0x1e86000
open("/proc/net/ip_tables_names", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
exit_group(0)                           = ?
+++ exited with 0 +++

@stgraber

This comment has been minimized.

Copy link
Member

commented May 1, 2016

So there are two bugs here:

  1. iptables-save should exit 1 and print an error, this should be reported to the iptables guys
  2. If /proc/net/ip_tables_names is safe for unprivileged users to read, then the kernel should be changed to reflect that.

In the mean time, if you absolutely need this to work, your only hope is to switch to a privileged container (security.privileged=true) with the security issues that come with this.

I'm closing this issue as there's unfortunately nothing LXD can do about this, it's an iptables & kernel bug.

@stgraber stgraber closed this May 1, 2016

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 1, 2016

Reported in iptables/netfilter project: https://bugzilla.netfilter.org/show_bug.cgi?id=1064

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 3, 2016

The bug should be fixed in kernel 4.5 according to the comments in netfilter bug. http://bugzilla.netfilter.org/show_bug.cgi?id=1064#c3

http://git.kernel.org/cgit/linux/kernel/git/pablo/nf-next.git/commit/?id=f13f2aeed154da8e48f90b85e720f8ba39b1e881

Can anyone test this? I don't know how to quickly test this with a newer kernel.

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 4, 2016

iptables-save still does not work with the newer Kernels.

The new comment on the bug is:

Regarding the kernel patch, it requires the following sequence of system calls,
so that a mapping for root is available before the network namespace is
created:

unshare(CLONE_NEWUSER);
/* Setup any mappings */
unshare(CLONE_NEWNET);

I expect lxc, since it predates the patch just unshares the network namespace
at the same time as the user namespace, which will not have the desired effect
in this case.

I don't know how lxc works; are unprivileged containers started direct from the
command line or via a daemon? If the former, could someone try running it with
"unshare -r"?
https://bugzilla.netfilter.org/show_bug.cgi?id=1064#c9

@stgraber Can you answer this?

@stgraber

This comment has been minimized.

Copy link
Member

commented May 4, 2016

We use clone() rather than unshare() and do pass all the flags in one shot as that's how clone() works.

unshare() isn't available on some kernels that we support so we'd need to make things a fair bit more clever...

It also just feels wrong having to do that stuff in two steps to begin with.

@hallyn can you look into this?

@hallyn

This comment has been minimized.

Copy link
Member

commented May 5, 2016

I'll take a look, but it's not my highest priority. Shout if you feel it is somewhat urgent.

@hallyn

This comment has been minimized.

Copy link
Member

commented May 9, 2016

Ugh. So yes. This requires lxc to first unshare userns and then
the rest. Mind you for the most part the kernel does the right
things so we shouldn't have to - the userns gets unshared first,
immediately, then the network (etc), so that network, mount, etc
have ->owner pointing to the new user_ns. After all this is how
the container admin can administer the container network. But
indeed for the uid setting of these files we would need the
root uid to be defined ahead of time.

One might be able to make a case for this being a kernel regression
as that has not been necessary before. The way to 'fix' it in the
kernel would be to set the uid/gid to k_uid -1 if no mapping is
defined yet, and always check at read time whether the root_uid is
-1 and calculate it then. However that's a bit messy.

To fix it lxc, we would need to detect CLONE_NEWUSER, and in that
case do the lxc_map_ids immediately, then unshare the rest of the
clone_flags and proceed. This becomes extra ugly due to implications
of unsharing vs cloning of PID_NS.

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 9, 2016

that's now too lowlevel for me. Can you reopen this issue or create a new one and comment this on the netfilter bugtracker too? http://bugzilla.netfilter.org/show_bug.cgi?id=1064#c3

@hallyn

This comment has been minimized.

Copy link
Member

commented May 9, 2016

Actually I have a fix for it in lxc at
lxc/lxc#1014

Since this issue was a lxd one re-opening wouldn't really be
helpful. However thanks very much for the information in it,
it was very helpful.

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 9, 2016

Great! (When) will this fix become available in default lxc on ubuntu 16.04?

@hallyn

This comment has been minimized.

Copy link
Member

commented May 9, 2016

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 23, 2016

@hallyn @stgraber The release contains a fix for /proc/net access. lxc/lxc#1014

I just tested the new release by enabling the xenial-proposed repository. iptables does not yet work with the xenial kernel.

I tested then with Kernel linux-4.5, wich was successful!

ubuntulxd@lxd1:~$ lxc config set unsparked-everette raw.lxc 'lxc.aa_allow_incomplete = 1'
ubuntulxd@lxd1:~$ lxc start unsparked-everette 
ubuntulxd@lxd1:~$ lxc exec unsparked-everette -- bash
root@unsparked-everette:~# iptables -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
root@unsparked-everette:~# iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0            tcp dpt:22

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
root@unsparked-everette:~# iptables-save 
# Generated by iptables-save v1.4.21 on Mon May 23 14:17:42 2016
*mangle
:PREROUTING ACCEPT [21:2226]
:INPUT ACCEPT [19:1536]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [20:1460]
:POSTROUTING ACCEPT [20:1460]
COMMIT
# Completed on Mon May 23 14:17:42 2016
# Generated by iptables-save v1.4.21 on Mon May 23 14:17:42 2016
*nat
:PREROUTING ACCEPT [2:690]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [6:428]
:POSTROUTING ACCEPT [6:428]
COMMIT
# Completed on Mon May 23 14:17:42 2016
# Generated by iptables-save v1.4.21 on Mon May 23 14:17:42 2016
*filter
:INPUT ACCEPT [7:560]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [8:576]
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -j ACCEPT
COMMIT
# Completed on Mon May 23 14:17:42 2016
root@unsparked-everette:~# uname -a
Linux unsparked-everette 4.5.2-040502-generic #201604200335 SMP Wed Apr 20 07:37:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

The kernel fix also needs to be backported to the 4.4 xenial kernel
https://bugzilla.netfilter.org/show_bug.cgi?id=1064

@stgraber

This comment has been minimized.

Copy link
Member

commented May 23, 2016

@sforshee do you know if that kernel change can make it into our xenial kernel?

@sforshee

This comment has been minimized.

Copy link

commented May 23, 2016

If the patch goes to upstream 4.4 stable as mentioned on bugzilla then xenial will get it from there once that happens. If a fix to Ubuntu is needed quickly then we should be able to pick it up for the next xenial SRU cycle (release ~4 weeks from now), in which case someone will need to open a bug in launchpad and assign it to me asap.

@pgassmann

This comment has been minimized.

Copy link
Author

commented May 23, 2016

It would be great if this would be backported asap. As it allows to manage the firewall within lxd instances using Puppet and probably other configuration management systems. And to use iptables-save manually

@sforshee

This comment has been minimized.

Copy link

commented May 23, 2016

@pgassmann could you file a bug in launchpad please? All updates to stable releases require a bug is filed.

https://bugs.launchpad.net/ubuntu/+source/linux/+filebug

@pgassmann

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.