Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison between debspawn and sbuild #27

Open
josch opened this issue Dec 10, 2021 · 2 comments
Open

Comparison between debspawn and sbuild #27

josch opened this issue Dec 10, 2021 · 2 comments

Comments

@josch
Copy link

josch commented Dec 10, 2021

Hi,

sbuild maintainer here. Today I learned that debspawn exists and as I seek to reduce the list of System Build Tools I would like to find out what debspawn offers that sbuild does not.

For example the first paragraph claims that sbuild uses a plain chroot. True it can do that. But it can also use any environment supported by autopkgtest (like lxc, lxd, qemu, schroot, ssh...) as well as the unshare backend which uses linux user namespaces to do package builds. What advantage does systemd-nspawn have over that? I'm tempted to implement a systemd-nspawn backend and/or a debspawn backend into sbuild but it seems that systemd-nspawn requires super user privileges while the sbuild unshare backend does not (not for the setup, nor for image creation and neither for running the build). Maybe you can point out in the README why a user would prefer to use debspawn over the sbuild unshare backend.

You state that a difference between sbuild and debspawn is unicode handling. Could you point me to the debspawn code that implements this so that I can understand what this paragraph means? Maybe this functionality should be included into sbuild.

You also state that sbuild works on OSes that are not Linux. I'd be interested to know which ones you are talking about. I have not heard of people using sbuild outside of Linux.

Then you state that debspawn is faster due to zstd tarballs and eatmydata. sbuild supports that as well plus lz4 tarballs which are again a tiny bit faster. Can you provide benchmarks to back up your claim?

Thanks!

cheers, josch

@ximion
Copy link
Member

ximion commented Dec 11, 2021

Hi @josch !
First of all, the README is seriously outdated, there's stuff in there that is no longer true about debspawn, let alone sbuild. So, this likely needs an update (although I am actually inclined to remove everything that contrasts debspawn with other tools except for intentional differences to sbuild).

To give a bit of context: When debspawn was created, the reason for doing so were a few (but not all of them existed initially):

  • I just wondered how easy it would be to build a tool like this, given that systemd-nspawn does some of the more complicated stuff already. So I just built a prototype after a "it will just take me a few hours" bet with friends at Debconf ;-)
  • I'm working on a tool called Laniakea which is a suite of tools to build Debian derivatives. It has a generic job runner called laniakea-spark (previously we used Paul Tagliamonte's debile) that was using sbuild back in the day. Setting up sbuild was always a bit annoying though, as it needed a bunch of configuration. We also had to parse the whole changelog again to extract whether builds failed reliably, and also why they failed: https://github.com/lkhq/laniakea-spark/blob/d0529862f48692483838c2e72bb4e96673fc45f7/spark/runners/sbuild.py#L138 - I thought that I could maybe simplify this by creating a tool that worked better for Laniakea
  • Back then at Purism we had people who never had touched Debian or built a Debian package, and setting up sbuild was still a bit more complex. With debspawn it's one command to create and image, then yet another one to build, and the environment is exactly the same as on the autobuilders, so reproducing failures is pretty easy.
  • We were to run some semi-trusted builds (nothing dangerous, but something that could cause issues on accident), and systemd-nspawn being a container solution allows some neat features like limiting available memory or CPU resources to the build.
  • I wanted the build tool to provide a generic interface as well that allowed running arbitrary commands, so we could build disk images in the same environment we build packages in, and also run various QA actions.
  • Initially I thought I could use OverlayFS for some extremely fast build setups - that did not work out though (it could be done with some modifications, but I didn't want to make those at the time as tarball decompression turned out to be rather quick).
  • I wanted to integrate some Debian package caching, so we wouldn't also need to setup apt-cacher-ng on the builder machines, but have one tool handle all stuff transparently in the background.
  • I also wanted to keep ANSI color codes around, so we could maybe later do something neat in Laniakea's web UI and color the build logs in HTML (but also keep the raw log for analysis), as color is pretty useful to quickly see where the issues are in an extremely long build log.

Debspawn is one of these projects that start to scratch one partcular itch (be a well-integrated build tool for Laniakea) and then grow way beyond their initially intended scope :-)

For example the first paragraph claims that sbuild uses a plain chroot. True it can do that. But it can also use any environment supported by autopkgtest (like lxc, lxd, qemu, schroot, ssh...) as well as the unshare backend which uses linux user namespaces to do package builds.

Could it do that 4 years ago already? If so, then I must have overlooked it. I know sbuild can do these things today, and even build via QEMU, so today that statement is definitely wrong nowadays.

What advantage does systemd-nspawn have over [unshare]?

I don't know how user namespaces are implemented exactly in sbuild - is unshare a chroot with user namespaces on top? Systemd-nspawn creates a pretty neat lightweight container that virtualizes the file system hierarchy, process tree, IPC subsystems, the host and domain name and also has a system callback filter to filter out some potentially problematic syscalls (which actually can be an issue if packages need these for tests, so debspawn weakens that by default).
So, systemd-nspawn gives a bit more isolation and is a nice pretty lightweight way if you want a container but don't need the complexity and dependencies of podman/docker & Co.

I'm tempted to implement a systemd-nspawn backend and/or a debspawn backend into sbuild but it seems that systemd-nspawn requires super user privileges while the sbuild unshare backend does not (not for the setup, nor for image creation and neither for running the build).

Systemd-nspawn can actually run without the need for superuser privileges if you tell it to use user namespaces - and I think debspawn should actually do that eventually. When playing with it a while back, I remember I ran into issues with it though. I can't recall what the actual problem was though which was a dealbreaker back then, which is a bit annoying (debspawn's method for dropping privileges inside of the container is also complicated, and user namespaces would have solved this). Maybe I should just try this again ;-)
The user namespace support for nspawn is actually quite extensive, see https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#User%20Namespacing%20Options for reference.

Maybe you can point out in the README why a user would prefer to use debspawn over the sbuild unshare backend.

I am kind of tempted to drop comparisons to other tools, as they are likely not useful and get outdated sooner or later...

You state that a difference between sbuild and debspawn is unicode handling. Could you point me to the debspawn code that implements this so that I can understand what this paragraph means? Maybe this functionality should be included into sbuild.

Oh, this was actually a huge debate years ago! Maybe https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=873919 can give some insight. Debian's default locale is "C", and therefore some stuff, especially Python code occasionally, falls flat when parsing unicode (file)names. Dpkg doesn't want to have an opinion on this. My opinion was that UTF-8 should be default (also to test things like building in a directory consisting of random emojis to see what fails - back then I was working on some QA to make sure software was handling unicode correctly).
This is implemented in debspawn by it setting LANG=C.UTF-8 in the build environment and "drawing" to the console using unicode characters to make readable build logs, unless this feature is disabled. IMHO this is a better default, but that is obviously a personal opinion.
tbh, I also don't know if anything has changed in the unicode-as-default department since 2016/2017.
I actually thought sbuild had a unicode option as well already, but maybe I imagined that.

You also state that sbuild works on OSes that are not Linux. I'd be interested to know which ones you are talking about. I have not heard of people using sbuild outside of Linux.

I am not sure what past-me was thinking about there exactly, but I'm pretty sure it was Debian/kFreeBSD and Debian/Hurd. For the former, I definitely know people who use sbuild on it.

Then you state that debspawn is faster due to zstd tarballs and eatmydata. sbuild supports that as well plus lz4 tarballs which are again a tiny bit faster. Can you provide benchmarks to back up your claim?

Back then I actually benchmarked this, but LZ4 would obviously beat zstd in most settings. The eatmydata stuff may be misleading, what I mean is that with sbuild you need (needed?) to configure this manually - debspawn just does it.
In general, debspawn intentionally doesn't offer a lot of configuration options, it's intended to just do whatever thing is "best" and doesn't actually let the developer mess with the build environment too much (the later versions actually don't follow this paradigm as much anymore, as debspawn is now used for package development as well, and not just as a dumb executor for an autobuilder).

I hope this clarifies things - I haven't read the README full in almost 4 years, and maybe with your feedback I can clean it up quite a bit :-)

@josch
Copy link
Author

josch commented Dec 11, 2021

Hi Matthias, thanks a lot for your very extensive and detailed reply!

Hi @josch ! First of all, the README is seriously outdated, there's stuff in there that is no longer true about debspawn, let alone sbuild. So, this likely needs an update (although I am actually inclined to remove everything that contrasts debspawn with other tools except for intentional differences to sbuild).

Even if it's not in the README, I think that listing what makes debspawn different/better than alternatives would be useful to have stated somewhere.

  • I just wondered how easy it would be to build a tool like this, given that systemd-nspawn does some of the more complicated stuff already. So I just built a prototype after a "it will just take me a few hours" bet with friends at Debconf ;-)

Ah, reminds me of myself when I wrote mmdebstrap 3 years ago. XD

  • I'm working on a tool called Laniakea which is a suite of tools to build Debian derivatives. It has a generic job runner called laniakea-spark (previously we used Paul Tagliamonte's debile) that was using sbuild back in the day. Setting up sbuild was always a bit annoying though, as it needed a bunch of configuration.

Yes. Today, there is the sbuild-debian-developer-setup package which hopefully makes the standard sbuild+schroot setup a bit easier.

We also had to parse the whole changelog again to extract whether builds failed reliably, and also why they failed: https://github.com/lkhq/laniakea-spark/blob/d0529862f48692483838c2e72bb4e96673fc45f7/spark/runners/sbuild.py#L138 - I thought that I could maybe simplify this by creating a tool that worked better for Laniakea

Yes, the log parsing is still an issue. There is still no machine readable way to identify reliably why a build failed without using shaky log parsing.

  • Back then at Purism we had people who never had touched Debian or built a Debian package, and setting up sbuild was still a bit more complex. With debspawn it's one command to create and image, then yet another one to build, and the environment is exactly the same as on the autobuilders, so reproducing failures is pretty easy.

Agreed. Sbuild is far from being that easy to use. I should write maybe another set of wrapper scripts to make default usage as easy as using a single command.

  • We were to run some semi-trusted builds (nothing dangerous, but something that could cause issues on accident), and systemd-nspawn being a container solution allows some neat features like limiting available memory or CPU resources to the build.

Ah, that's where the root requirement of systemd-nspawn comes from, I guess.

  • Initially I thought I could use OverlayFS for some extremely fast build setups - that did not work out though (it could be done with some modifications, but I didn't want to make those at the time as tarball decompression turned out to be rather quick).

Yes. While sbuild (and schroot) also supports overlayfs mounting, for my personal use I also resort to just unpacking chroot tarballs to a tmpfs as I found it to be sufficiently fast.

  • I wanted to integrate some Debian package caching, so we wouldn't also need to setup apt-cacher-ng on the builder machines, but have one tool handle all stuff transparently in the background.

Understandable. I still struggle with apt-cacher-ng as it seems to somehow fill up the package cache without ever cleaning up and dropping cached packages after some time...

For example the first paragraph claims that sbuild uses a plain chroot. True it can do that. But it can also use any environment supported by autopkgtest (like lxc, lxd, qemu, schroot, ssh...) as well as the unshare backend which uses linux user namespaces to do package builds.

Could it do that 4 years ago already? If so, then I must have overlooked it. I know sbuild can do these things today, and even build via QEMU, so today that statement is definitely wrong nowadays.

The autopkgtest backend was introduced in sbuild 0.69.0, released in May 2016. The unshare backend was introduced in sbuild 0.77, released in July 2018. In any case, both backends were never advertised anywhere and I haven't yet received a single bug report for those two backends, so either nobody is using those or they are bug free (very unlikely).

What advantage does systemd-nspawn have over [unshare]?

I don't know how user namespaces are implemented exactly in sbuild - is unshare a chroot with user namespaces on top? Systemd-nspawn creates a pretty neat lightweight container that virtualizes the file system hierarchy, process tree, IPC subsystems, the host and domain name and also has a system callback filter to filter out some potentially problematic syscalls (which actually can be an issue if packages need these for tests, so debspawn weakens that by default). So, systemd-nspawn gives a bit more isolation and is a nice pretty lightweight way if you want a container but don't need the complexity and dependencies of podman/docker & Co.

The ushare backend uses linux user namespaces to unshare the mount, UTS, IPC, network, PID and user namespaces. I think that's also what other container solutions like docker, lxc and systemd-nspawn do. The unshare backend cannot limit resources like CPU or RAM and it cannot limit syscalls -- all these features would require being root at some point and I wanted to avoid that. I see that it might make sense to add a systemd-nspawn backend to make these features available to sbuild users.

I'm tempted to implement a systemd-nspawn backend and/or a debspawn backend into sbuild but it seems that systemd-nspawn requires super user privileges while the sbuild unshare backend does not (not for the setup, nor for image creation and neither for running the build).

Systemd-nspawn can actually run without the need for superuser privileges if you tell it to use user namespaces - and I think debspawn should actually do that eventually. When playing with it a while back, I remember I ran into issues with it though. I can't recall what the actual problem was though which was a dealbreaker back then, which is a bit annoying (debspawn's method for dropping privileges inside of the container is also complicated, and user namespaces would have solved this). Maybe I should just try this again ;-) The user namespace support for nspawn is actually quite extensive, see https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#User%20Namespacing%20Options for reference.

I haven't been able to make it work though. When I try to run systemd-nspawn with arguments like --private-users=pick --private-users-ownership=auto I still get the message Need to be root.. I'd welcome any pointers that allow running systemd-nspawn without being root first.

Though to get features like RAM and CPU limits, root would be definitely needed. How does debspawn handle this? Does the user need to run sudo debspawn or is there a setuid root binary somewhere?

Maybe you can point out in the README why a user would prefer to use debspawn over the sbuild unshare backend.

I am kind of tempted to drop comparisons to other tools, as they are likely not useful and get outdated sooner or later...

As the author of one of these tools, I actually find these comparisons useful.

You state that a difference between sbuild and debspawn is unicode handling. Could you point me to the debspawn code that implements this so that I can understand what this paragraph means? Maybe this functionality should be included into sbuild.

Oh, this was actually a huge debate years ago! Maybe https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=873919 can give some insight. Debian's default locale is "C", and therefore some stuff, especially Python code occasionally, falls flat when parsing unicode (file)names. Dpkg doesn't want to have an opinion on this. My opinion was that UTF-8 should be default (also to test things like building in a directory consisting of random emojis to see what fails - back then I was working on some QA to make sure software was handling unicode correctly). This is implemented in debspawn by it setting LANG=C.UTF-8 in the build environment and "drawing" to the console using unicode characters to make readable build logs, unless this feature is disabled. IMHO this is a better default, but that is obviously a personal opinion. tbh, I also don't know if anything has changed in the unicode-as-default department since 2016/2017. I actually thought sbuild had a unicode option as well already, but maybe I imagined that.

Using C.UTF-8 has been the default in sbuild since 0.78.0 (released Jan 2019).

You also state that sbuild works on OSes that are not Linux. I'd be interested to know which ones you are talking about. I have not heard of people using sbuild outside of Linux.

I am not sure what past-me was thinking about there exactly, but I'm pretty sure it was Debian/kFreeBSD and Debian/Hurd. For the former, I definitely know people who use sbuild on it.

Ah okay, I misunderstood and was thinking of Linux as in GNU/Linux but yes, you are right, the Debian ports buildds for hurd and kfreebsd definitely run sbuild.

Then you state that debspawn is faster due to zstd tarballs and eatmydata. sbuild supports that as well plus lz4 tarballs which are again a tiny bit faster. Can you provide benchmarks to back up your claim?

Back then I actually benchmarked this, but LZ4 would obviously beat zstd in most settings. The eatmydata stuff may be misleading, what I mean is that with sbuild you need (needed?) to configure this manually - debspawn just does it. In general, debspawn intentionally doesn't offer a lot of configuration options, it's intended to just do whatever thing is "best" and doesn't actually let the developer mess with the build environment too much (the later versions actually don't follow this paradigm as much anymore, as debspawn is now used for package development as well, and not just as a dumb executor for an autobuilder).

I hope this clarifies things - I haven't read the README full in almost 4 years, and maybe with your feedback I can clean it up quite a bit :-)

Thanks a lot for your time! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants