Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't start containers with different names but same prefix (xxxxxxxxxxx-1 and xxxxxxxxxxx-2) #13417

Closed
mbiebl opened this issue Aug 28, 2019 · 5 comments
Labels

Comments

@mbiebl
Copy link
Contributor

mbiebl commented Aug 28, 2019

systemd version the issue has been seen with

v242

Used distribution

Debian
Filed originally as downstream bug report https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=935948
The following is copied from this bug report verbatim:

Due to IFNAMSIZ, nspawn's network interfaces names are truncated.
The possibility of collisions should be clearly documented.

My test containers have reasonably long names:

    root@not-omega:~# ls -l /var/lib/machines/
    total 75
    drwxr-xr-x  2 root       root        2 May 15 13:59 alamo
    drwxr-xr-x 21 1087766528 1087766528 21 Aug  8 13:25 dns-test
    drwxr-xr-x 21 1075707904 1075707904 21 Jan  1  2019 my-new-container
    -rw-r--r--  1 root       root        0 Aug  8 17:55 my-new-container.nspawn~
    drwxr-xr-x 21 root       root       21 Aug  7 19:46 nft-test-downstream
    drwxr-xr-x 21 1242628096 1242628096 21 Aug  7 19:48 nft-test-upstream
    drwxr-xr-x 21 1486684160 1486684160 21 May 15 15:32 not-alamo
    drwxr-xr-x 21 1669005312 1669005312 21 Jan  1  2019 test-alloc-1566986334
    drwxr-xr-x 21 1024851968 1024851968 21 Jan  1  2019 test-alloc-1566988389
    drwxr-xr-x 21 1678049280 1678049280 21 Aug  9 01:36 upstream-container
    -rw-rw-r--  1 root       root        0 Aug  8 17:55 upstream-container.nspawn

I noticed that the interfaces created by systemd-nspawn do not use the full name:

    root@not-omega:~# machinectl status test-alloc-1566986334 | grep Iface
               Iface: ve-test-alloc-

    root@not-omega:~# ip -o l | grep alloc
    11: ve-test-alloc-@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000\    link/ether 26:b1:44:74:31:e8 brd ff:ff:ff:ff:ff:ff link-netnsid 0

I wondered what would happen if the "unique" interface happened to
already be in use by another container.

The answer is that systemd-nspawn just crashes with a dumb error:

    + machinectl start test-alloc-1566988389
    Job for systemd-nspawn@test-alloc-1566988389.service failed because the control process exited with error code.
    See "systemctl status systemd-nspawn@test-alloc-1566988389.service" and "journalctl -xe" for details.

    root@not-omega:~# journalctl -u systemd-nspawn@test-alloc-1566988389.service
    -- Logs begin at Wed 2019-05-01 20:15:10 AEST, end at Wed 2019-08-28 20:36:53 AEST. --
    Aug 28 20:36:53 not-omega systemd[1]: Starting Container test-alloc-1566988389...
    Aug 28 20:36:53 not-omega systemd-nspawn[5357]: Failed to add new veth interfaces (ve-test-alloc-:host0): File exists
    Aug 28 20:36:53 not-omega systemd[1]: systemd-nspawn@test-alloc-1566988389.service: Main process exited, code=exited, status=1/FAILURE
    Aug 28 20:36:53 not-omega systemd[1]: systemd-nspawn@test-alloc-1566988389.service: Failed with result 'exit-code'.
    Aug 28 20:36:53 not-omega systemd[1]: Failed to start Container test-alloc-1566988389.
    Aug 28 20:36:53 not-omega systemd[1]: systemd-nspawn@test-alloc-1566988389.service: Consumed 339ms CPU time, no IP traffic.

This limitation is not obvious (to me).
In the systemd-nspawn manpage, it indicates ve-X should machine the machine name.

I THINK this is happening due to IFNAMSIZ in
src/nspawn/nspawn-network.c:setup_veth(), which is:

    src/basic/linux/if.h:32:#define IFNAMSIZ 16

If this is an unavoidable limitation due to Linux, please at least
warn about it in the systemd-nspawn manpage.

Maybe systemd-nspawn or machinectl could even look for this collision
and specifically warn about it, e.g.

    systemd-nspawn: cannot create interface "ve-X" for container X-2, because another container (X-1) is already using it.  Either rename a container, or use non-default networking (i.e. don't use --network-veth).

A quick test of a (non-systemd) client suggests this is indeed a fundamental constraint:

    root@not-omega:~# ip link add waffle type veth peer jaffa
    root@not-omega:~# ip link set waffle name wafflexxxxxxxxxxxxxx
    Error: argument "wafflexxxxxxxxxxxxxx" is wrong: "name" not a valid ifname
    root@not-omega:~# ip link set waffle name wafflexxxxxxxxxxx
    Error: argument "wafflexxxxxxxxxxx" is wrong: "name" not a valid ifname
    root@not-omega:~# ip link set waffle name wafflexxxxxxxx
    root@not-omega:~#

PS: systemd-nspawn is picky about container names (e.g. can't have a
underscore). If this is ultimately based on RFC 952, note that
RFC 952 allows up to 24 bytes (longer than IFNAMSIZ).

@mbiebl mbiebl added the nspawn label Aug 28, 2019
@arianvp
Copy link
Contributor

arianvp commented Aug 28, 2019

I opened this issue too and it was closed with a PR that documents this limitation #10721 Do the merged docs there answer your question?

@mbiebl
Copy link
Contributor Author

mbiebl commented Aug 28, 2019

I've forwarded your question to the original bug reporter (yay for playing bug proxy)

@poettering
Copy link
Member

#12865 is your fix, but the PR appears to be stale. Anyone wants to pick that one up?

@poettering
Copy link
Member

/cc @kakra

@kakra
Copy link
Contributor

kakra commented Aug 28, 2019

@poettering Yes, that would be the fix. I'm soon going to resume working on it because we are slowly moving our staging environment to production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants