Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxc 5.0.1 does not autostart container after unpriviledged user slice has been established #4232

Open
tglaeser opened this issue Nov 28, 2022 · 14 comments
Labels
Incomplete Waiting on more information from reporter

Comments

@tglaeser
Copy link

Required information

  • Distribution: Gentoo
  • Distribution version: latest
  • The output of
    • lxc-start --version: 5.0.1
    • uname -a: Linux lipari 5.15.75-gentoo #6 SMP Sun Nov 27 18:13:41 EST 2022 x86_64 12th Gen Intel(R) Core(TM) i7-1260P GenuineIntel GNU/Linux
    • cat /proc/self/cgroup: 0::/user.slice/user-0.slice/session-5.scope

Issue description

Invoking

$ systemctl --user restart lxc@desktop.service

form a terminal after logging in as user admin works just fine, so I would expect that after executing

$ systemctl --user enable lxc@desktop.service

the service auto-starts during login, however

$ systemctl --user status lxc@desktop.service
○ lxc@desktop.service - LXC Container: desktop
     Loaded: loaded (/home/admin/.config/systemd/user/lxc@.service; enabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:lxc-start
             man:lxc

is observed after login.

Further information

I have a working system running lxc 4.0.12 and systemd 251 using file .config/systemd/user/lxc@.service:

[Unit]
Description=LXC container %I
After=network.target

[Service]
Type=forking
ExecStart=/usr/bin/lxc-start -n %i
ExecStop=/usr/bin/lxc-stop -n %i
StandardOutput=journal
Delegate=yes

[Install]
WantedBy=default.target

On the new system running lxc 5.0.1 and systemd 251 old file .config/systemd/user/lxc@.service from above doesn't work anymore, therefore I copied the default from /lib/systemd/system/lxc@.service to /home/admin/.config/systemd/user/lxc@.service:

[Unit]
Description=LXC Container: %i
# This pulls in apparmor, dev-setup, lxc-net
After=lxc.service
Wants=lxc.service
Documentation=man:lxc-start man:lxc

[Service]
Type=simple
KillMode=mixed
TimeoutStopSec=120s
ExecStart=/usr/bin/lxc-start -F -n %i
ExecStop=/usr/bin/lxc-stop -n %i
# Environment=BOOTUP=serial
# Environment=CONSOLETYPE=serial
Delegate=yes

[Install]
WantedBy=multi-user.target

With this configuration the above described behavior can be observed.

To me it seems that changes to the systemd configuration are needed when upgrading lxc from version 4.0.12 to 5.0.1.

What am I missing; how exactly should I configure systemd for an unprivileged user so that the container auto-starts while logging in?

@tglaeser
Copy link
Author

Hi @stgraber - Any pointers for me this time around?

@tglaeser
Copy link
Author

tglaeser commented Dec 3, 2022

Thinking more about what I did above, simply copying /lib/systemd/system/lxc@.service to /home/admin/.config/systemd/user/lxc@.service probably doesn't do it, so I made some changes to it:

$ cat ~/.config/systemd/user/lxc\@.service
[Unit]
Description=LXC Container: %i
After=default.target
Wants=default.target
Documentation=man:lxc-start man:lxc

[Service]
Type=simple
KillMode=mixed
TimeoutStopSec=120s
ExecStart=/usr/bin/lxc-start -F -n %i
ExecStop=/usr/bin/lxc-stop -n %i
# Environment=BOOTUP=serial
# Environment=CONSOLETYPE=serial
Delegate=yes

[Install]
WantedBy=default.target

At least now I get some meaningful error:

$ journalctl --user --boot
systemd[948]: Queued start job for default target Main User Target.
systemd[948]: Created slice User Application Slice.
systemd[948]: Created slice Slice /app/lxc.
systemd[948]: Reached target Paths.
systemd[948]: Reached target Timers.
systemd[948]: Starting D-Bus User Message Bus Socket...
systemd[948]: Listening on D-Bus User Message Bus Socket.
systemd[948]: Reached target Sockets.
systemd[948]: Reached target Basic System.
systemd[948]: Reached target Main User Target.
systemd[948]: lxc@desktop.service: unit configures an IP firewall, but not running as root.
systemd[948]: (This warning is only shown for the first unit using IP firewalling.)
systemd[948]: Started LXC Container: desktop.
systemd[948]: Startup finished in 58ms.
systemd[948]: Created slice User Core Session Slice.
systemd[948]: Starting D-Bus User Message Bus...
systemd[948]: Started D-Bus User Message Bus.
dbus-daemon[956]: [session uid=1000 pid=956] Successfully activated service 'org.freedesktop.systemd1'
systemd[948]: Started lxc-desktop-0.scope.
lxc-start[955]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/network.c: lxc_create_network_unpriv_exec: 2990 lxc-user-nic failed to configure requested network: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 474: instantiate_veth - Invalid argument - Failed to>
lxc-start[955]: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 529: create_nic: Error creating veth tunnel
lxc-start[955]: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 720: get_nic_if_avail: Failed to create new nic
lxc-start[955]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: lxc_spawn: 1840 Failed to create the network
lxc-start[955]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: __lxc_start: 2107 Failed to spawn container "desktop"
lxc-start[955]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 306 The container failed to start
lxc-start[955]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options
systemd[948]: lxc@desktop.service: Main process exited, code=exited, status=1/FAILURE
systemd[948]: lxc@desktop.service: Failed with result 'exit-code'.
systemd[948]: Reloading.

Unfortunately in the user scope there is no network.target I could set After/Wants to.

So it seems like the whole issue comes down to the systemd configuration file lxc@.service for user services.

Again, if I log in, open a terminal, and execute systemctl --user start lxc@desktop.service everything is working just fine.

Any pointers how to delay the user service startup until all prerequisites are met? There must be some documentation how to configure a LXC user service.

@tglaeser
Copy link
Author

tglaeser commented Jan 1, 2023

Anyone please; you guys must have an answer how to auto-start an unprivileged container?

@mihalicyn
Copy link
Member

mihalicyn commented Jan 5, 2023

Hi @tglaeser!

Please, show the full log line for:

requested network: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 474: instantiate_veth - Invalid argument - Failed to>

Have you tried to start the container "manually" just by invoking lxc-start directly from the interactive terminal? Is it working?

@tglaeser
Copy link
Author

tglaeser commented Jan 6, 2023

Sorry, I didn't notice that the logging was cut off; here are the full lxc-start lines:

$ journalctl --user --boot | grep lxc-start
lxc-start[1394]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/network.c: lxc_create_network_unpriv_exec: 2990 lxc-user-nic failed to configure requested network: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 474: instantiate_veth - Invalid argument - Failed to create veth1000_UTWW-veth1000_UTWWp
lxc-start[1394]: ../lxc-5.0.1/src/lxc/cmd/lxc_user_nic.c: 529: create_nic: Error creating veth tunnel
lxc-start[1394]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: lxc_spawn: 1840 Failed to create the network
lxc-start[1394]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: __lxc_start: 2107 Failed to spawn container "desktop"
lxc-start[1394]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 306 The container failed to start
lxc-start[1394]: lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options

Running commands lxc-start desktop as well as systemctl --user start lxc@desktop.service after logging in works just fine.

In this case the following network interface gets created successfully.

9: veth1000_fbXp@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000
    link/ether fe:c1:df:df:b7:9a brd ff:ff:ff:ff:ff:ff link-netnsid 1

Compared to this, the failed interface name veth1000_UTWW-veth1000_UTWWp seems to follow a very different naming convention.

@mihalicyn
Copy link
Member

I can guess that the problem comes from:

static int get_mtu(char *name)
{
	int idx;

	idx = if_nametoindex(name);
	if (idx < 0)
		return -1; // <<<<< 

///


int lxc_veth_create(const char *name1, const char *name2, pid_t pid, unsigned int mtu,
                    int n_rxqueues, int n_txqueues) //<< mtu is unsigned int
{

...

	if (mtu > 0 && nla_put_u32(nlmsg, IFLA_MTU, mtu)) // mtu > 0 because mtu is unsigned and contains `-1` as a value...
		return ret_errno(ENOMEM);

Then in the kernel:

int dev_validate_mtu(struct net_device *dev, int new_mtu,
		     struct netlink_ext_ack *extack)
{
	/* MTU must be positive, and in range */
	if (new_mtu < 0 || new_mtu < dev->min_mtu) { // <<< fail here
		NL_SET_ERR_MSG(extack, "mtu less than device minimum");
		return -EINVAL;
	}

	if (dev->max_mtu > 0 && new_mtu > dev->max_mtu) {
		NL_SET_ERR_MSG(extack, "mtu greater than device maximum");
		return -EINVAL;
	}
	return 0;
}

The root reason is that the the bridge device doesn't exist when the container starts.

mihalicyn added a commit to mihalicyn/lxc that referenced this issue Jan 6, 2023
get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.

Issue lxc#4232

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@mihalicyn
Copy link
Member

@tglaeser just to conclude, you need to ensure that the bridge device is created before you are trying to start the container.

My PR #4252 is just an improvement of error handling in the case when the bridge doesn't exist.

mihalicyn added a commit to mihalicyn/lxc that referenced this issue Jan 6, 2023
get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.

Issue lxc#4232

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
mihalicyn added a commit to mihalicyn/lxc that referenced this issue Jan 6, 2023
get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.

Issue lxc#4232

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@tglaeser
Copy link
Author

tglaeser commented Jan 6, 2023

Your analysis seems to be correct:

# journalctl --boot | grep br0
systemd-networkd[1376]: br0: Link UP

So we have timestamp 21:15:45 for the bridge up and 21:15:28 for the lxc-start command.

But the bridge is created as user root and the unprivileged container is started as user admin; how can I tell systemd that a user service depends on the root service?

stgraber pushed a commit that referenced this issue Jan 6, 2023
get_mtu() returns int, but "mtu" variable has unsigned int type.
It leads to logical error in error handling, which can end up
with strange -EINVAL error in lxc_veth_create(), cause (mtu > 0)
condition is met, but negative "mtu" value is too large when set
as mtu for network device.

Issue #4232

Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
@tglaeser
Copy link
Author

tglaeser commented Jan 7, 2023

My current workaround is

$ systemctl --user disable lxc@desktop.service
$ echo -e "[Timer]\nOnStartupSec=10\n\n[Install]\nWantedBy=timers.target" | tee .config/systemd/user/lxc\@.timer
[Timer]
OnStartupSec=10

[Install]
WantedBy=timers.target
$ systemctl --user enable lxc@desktop.timer

which delays the startup of the container by 10 seconds; I don't really like that fixed delay approach, would rather prefer to declare Requires.

@tglaeser
Copy link
Author

tglaeser commented Jan 8, 2023

The above mentioned workaround only works when logging in as unprivileged user admin right away, otherwise root logging shows error

# journalctl --boot | grep lxc-start
lxc-start: desktop: ../lxc-5.0.1/src/lxc/conf.c: lxc_map_ids: 3672 newuidmap failed to write mapping "newuidmap: uid range [0-1000) -> [1000000-1001000) not allowed": newuidmap 1398 0 1000000 1000 1000 1000 1 1001 1001001 64535
lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: lxc_spawn: 1788 Failed to set up id mapping.
lxc-start: desktop: ../lxc-5.0.1/src/lxc/start.c: __lxc_start: 2107 Failed to spawn container "desktop"
lxc-start: desktop: ../lxc-5.0.1/src/lxc/conf.c: lxc_map_ids: 3672 newuidmap failed to write mapping "newuidmap: uid range [0-1000) -> [1000000-1001000) not allowed": newuidmap 1408 65536 396 1 0 1000000 1000
lxc-start: desktop: ../lxc-5.0.1/src/lxc/conf.c: userns_exec_1: 5070 Error setting up {g,u}id mappings for child process "1408"
lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 306 The container failed to start
lxc-start: desktop: ../lxc-5.0.1/src/lxc/tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options

This I don't quite understand: I have not yet logged in as unprivileged user admin, therefore the user application slice has not yet been activated; why then has the lxc-start command been called for my unprivileged container?

@mihalicyn
Copy link
Member

It's more about systemd rather than LXC. Unfortunately, I can't give you quick advice regarding this stuff with service dependencies.

@tglaeser
Copy link
Author

tglaeser commented Jan 9, 2023

I understand that the answer to the problem described here relates to the systemd configuration, but I would still expect that the LXC team has an answer to it given that this is the recommended setup; you guys must have something in mind on how the user and root services are supposed to work together, don't you?

@mihalicyn
Copy link
Member

Sorry for long delay, @tglaeser

I have re-read this thread again. Couldn't you try to reformulate the question that you still have?

As far as I understand the problem was with the race condition between bridge creation and container start. Now you have fixed that by adjusting configuration a bit.

Last question is about why you are getting errors when logging-in as a root? From what I see in your logs:

newuidmap failed to write mapping "newuidmap: uid range [0-1000) -> [1000000-1001000) not allowed": newuidmap 1398 0 1000000 1000 1000 1000 1 1001 1001001 64535

uid range is not allowed for the root user. That's a reason.
Please, show cat /etc/subuid.

@mihalicyn mihalicyn added the Incomplete Waiting on more information from reporter label Feb 23, 2024
@tglaeser
Copy link
Author

tglaeser commented Feb 23, 2024

I haven't tried this for some time now, instead I had configured

$ systemctl --user disable lxc@desktop

and would start the container manually when needed. That always worked.

Now, to verify if this is still a problem, I ran

$ systemctl --user enable lxc@desktop

again, rebooted the host, waited some time, logged in as user admin, and this time it is working as expected:

$ journalctl --user --boot
systemd[3260]: Queued start job for default target Main User Target.
systemd[3260]: Created slice User Application Slice.
dbus-daemon[3272]: [session uid=1000 pid=3272] Successfully activated service 'org.freedesktop.systemd1'
systemd[3260]: Created slice Slice /app/lxc.
systemd[3260]: Reached target Paths.
systemd[3260]: Reached target Timers.
systemd[3260]: Starting D-Bus User Message Bus Socket...
systemd[3260]: Listening on D-Bus User Message Bus Socket.
systemd[3260]: Reached target Sockets.
systemd[3260]: Reached target Basic System.
systemd[3260]: lxc@desktop.service: unit configures an IP firewall, but not running as root.
systemd[3260]: lxc@desktop.service: (This warning is only shown for the first unit using IP firewalling.)
systemd[3260]: Started LXC Container: desktop.
systemd[3260]: Reached target Main User Target.
systemd[3260]: Startup finished in 203ms.
systemd[3260]: Created slice User Core Session Slice.
systemd[3260]: Starting D-Bus User Message Bus...
systemd[3260]: Started D-Bus User Message Bus.
systemd[3260]: Started lxc-desktop-0.scope.
$ systemctl --user status lxc@desktop
● lxc@desktop.service - LXC Container: desktop
     Loaded: loaded (/home/admin/.config/systemd/user/lxc@.service; enabled; preset: enabled)
     Active: active (running) since Fri 2024-02-23 11:06:51 EST; 8min ago
       Docs: man:lxc-start
             man:lxc
   Main PID: 3269 (lxc-start)
     Memory: 808.0K (peak: 1.5M)
        CPU: 6ms
     CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/app-lxc.slice/lxc@desktop.service
             ‣ 3269 /usr/bin/lxc-start -F -n desktop

systemd[3260]: lxc@desktop.service: unit configures an IP firewall, but not running as root.
systemd[3260]: lxc@desktop.service: (This warning is only shown for the first unit using IP firewalling.)
systemd[3260]: Started LXC Container: desktop.

My current configuration:

$ lxc-start --version
5.0.2
$ uname -a
Linux ... 6.1.12-gentoo #1 SMP PREEMPT_DYNAMIC Sun Mar  5 17:08:32 EST 2023 x86_64 Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz GenuineIntel GNU/Linux
$ cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-3.scope
$ cat /etc/subuid
root:1000000:65536
root:1000:1
admin:1000000:65536
admin:1000:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Incomplete Waiting on more information from reporter
Development

No branches or pull requests

2 participants