Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildah images not so small? #532

Closed
tdudgeon opened this issue Mar 25, 2018 · 42 comments
Closed

Buildah images not so small? #532

tdudgeon opened this issue Mar 25, 2018 · 42 comments

Comments

@tdudgeon
Copy link

Description

One of the key points of buildah is that it allows you to build small images without lots of extra fluff like yum and python. What I'm finding is that the images buildah creates are bigger than the traditional docker images, even though they don't contain this extra fluff.
What is happening here?

Steps to reproduce the issue:

This is all done on a new Centos7 cloud image with docker and buildah installed from RPMs.

First let's define our target.

$ docker pull centos:7
$ docker images | grep centos
docker.io/centos          7                   2d194b392dd1        2 weeks ago         195 MB

The Docker images is 195MB in size.

Now let's create a minimal image with only coreutils and bash packages added (the dockere image has both of these present). Here is the script I used:

#!/bin/bash

set -x

# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)

# install the packages
yum install --installroot $scratchmnt bash coreutils --releasever 7 --setopt install_weak_deps=false -y
yum clean all -y --installroot $scratchmnt --releasever 7

sudo buildah config --cmd /bin/bash $newcontainer

# set some config info
buildah config --label name=centos-base $newcontainer

# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer centos-base

Run this script:

$ sudo ./buildah-base.sh

Now let's look at the image that is built:

$ sudo buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
8379315d3e3e         docker.io/library/centos-base:latest                     Mar 25, 2018 17:08     212.1 MB

Hey! The image is 212MB in size, bigger than the Docker image. And looking into it confirms it does have yum or python installed.
Why is it bigger, not smaller?

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 26, 2018

I don't think this is a direct comparison. If you track down the centos 7 base image, it looks like it's built using a base filesystem tarball rather than using yum to install the base system.

The centos 7 docker image on docker hub links to the dockerfile, the base filesystem tarball is in the repo:
https://github.com/CentOS/sig-cloud-instance-images/blob/02904503939756f540cfaa3fbafbf280e8a11bef/docker/Dockerfile

There are likely other unnecessary files stripped out of the centos images.

@rhatdan
Copy link
Member

rhatdan commented Mar 26, 2018

@nalind @mtrmac Isn't this also an issue of compressed versus uncompressed?

@tdudgeon
Copy link
Author

@pixdrift Yes, of course its not an exact comparison as they were built differently.
But its not what I was expecting.
The image built with buildah contains just the minimal scratch image (clocking in at a measly 1.77 KB for me) plus the bash and coreutils packages (which takes it up to 212 MB).
The centos Dockerhub image contains those same bash and coreutils packages plus python and yum, and maybe other things too. And despite these extra things it comes in at a smaller size.

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 26, 2018

@rhatdan
Copy link
Member

rhatdan commented Mar 26, 2018

@tdudgeon Could you check to see where the extra size is coming from?
On the buildah container do

du -sM /*

To show if there is any weird space being used.

@rhatdan
Copy link
Member

rhatdan commented Mar 26, 2018

Could this be the CLanguage bindings?

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 26, 2018

It might be helpful to start with determining whether the difference in size is due to the container contents, or due to the tooling.

Is the ratio between du -sc / and the size reported by docker images/buildah images roughly equal (i.e. the content is the difference), or significantly different (i.e. the tooling is the difference?)

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 26, 2018

Not a direct comparison, but another data point. Used the provided script to build from Oracle Linux 7 repo.

# buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
95cc7dba2f97         docker.io/library/buildah-ol7:latest                     Mar 26, 2018 22:28     177.4 MB

Then pushed to the docker-daemon using buildah push

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
buildah                       ol7                 95cc7dba2f97        4 minutes ago       182 MB

So there is a minor discrepancy in reported size, but in this case buildah is less than docker. Need to build in docker too for comparison.

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 26, 2018

I built the centos:7 container from the upstream Dockerfile using buildah bud:

# buildah images
IMAGE ID             IMAGE NAME                                               CREATED AT             SIZE
102faebad41b         <none>                                                   Mar 26, 2018 23:15     194.5 MB

Pushed to docker:

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
centos                        7                   ea2fac082cce        3 minutes ago       195 MB

Looks like the tooling difference. The kickstart used to build the centos 7 filesystem image posted by @mtrmac shows the initial package selection is quite different.

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 27, 2018

@tdudgeon

Pulling apart the ks file posted above, and looking in the image, it looks like the documentation (~4MB) and locale-archive (~99MB) is what is causing the size issues. If you force your locale in the yum installer and specify nodocs, you will get a significantly smaller base image:

# docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED             SIZE
buildah                       stripped            4ddfc8034046        6 minutes ago       56.4 MB

Updated buildah script:

#!/bin/bash

set -x

# build a minimal image
newcontainer=$(buildah from scratch)
scratchmnt=$(buildah mount $newcontainer)

# install the packages
yum install --installroot $scratchmnt bash coreutils --releasever 7 --setopt=install_weak_deps=false --setopt=tsflags=nodocs --setopt=override_install_langs=en_US.utf8 -y
yum clean all -y --installroot $scratchmnt --releasever 7

sudo buildah config --cmd /bin/bash $newcontainer

# set some config info
buildah config --label name=centos-base $newcontainer

# commit the image
buildah unmount $newcontainer
buildah commit $newcontainer centos-base

Interested to know if this works ok from your CentOS source, and what kind of result you get with regard to size.

@tdudgeon
Copy link
Author

@pixdrift Hey that makes a big difference. In my case it builds an image that is 91.56 MB in size. Much smaller than the original 212.1 MB (though still a fair bit bigger than your one of 56.4 MB).

@pixdrift
Copy link
Collaborator

@tdudgeon I suspect the difference in size is due to package dependency creep in the included packages. The build I was using may be from a 7.2/7.3 repo (was a random dev instance I had). Will have a closer look tomorrow and point it at 7.4. In total, I believe there were 20 rpms installed.

@rhatdan
Copy link
Member

rhatdan commented Mar 27, 2018

@tdudgeon Is Buildah now smaller the docker build?

@rhatdan
Copy link
Member

rhatdan commented Mar 27, 2018

@ipbabble Might be worth a blog on how to handle languages and make smaller images.

@tdudgeon
Copy link
Author

@rhatdan Yes, the centos:7 image on Docker Hub is 195 MB whilst my latest equivalent with buildah is 91.56 MB. So just under half the size.

@rhatdan
Copy link
Member

rhatdan commented Mar 27, 2018

WooHoo.
BTW Size is one important factor, but another factor customers look at is the number of packages/files inside of a container. They are looking to limit attack surface, with the theory that the fewer files/executables in the image the harder it is to exploit a container.
So you might want to get a count of RPMs install

@tdudgeon
Copy link
Author

The Docker Hub centos:7 image has 143 packages.
The one built by buildah has 64 packages.
But I had to yum install rpm so that I could count them, so that should really be 63.

@mohammedzee1000
Copy link

mohammedzee1000 commented Mar 27, 2018

You guys might also want to take a look at the atomic image https://github.com/CentOS/atomic-container.

It is built using microdnf and is already as small as 78 mb

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 27, 2018

@mohammedzee1000, Thanks for the suggestion, this image uses essentially the same package selection as I have outlined above with the os release, microdnf and systemd added.. then some manual cleanup of the resulting filesystem.

%packages --excludedocs --nobase --nocore --instLangs=en
bash
centos-release
microdnf
systemd

I am interested to know how my OL7 image ended up so much smaller (package count and size) than CentOS, I can only assume dependency changes.

@tdudgeon, can you post an RPM list from your container and I will put together a comparison? The yum installation log from buildah should be enough and won't require modifications to the image contents.

@giuseppe
Copy link
Member

you can save some space removing the locales you don't need. This should be quite safe to do:

rm $scratchmnt/usr/lib/locale/locale-archive*
find $scratchmnt/usr/share/locale/ \! -name '*en*' -exec rm -rf \{\} \;

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 27, 2018

@giuseppe, this is redundant if you use the yum parameter I have provided above (--setopt=override_install_langs=en_US.utf8) because alternate locales aren't installed.

The locale-archive when specifying the language in yum is 1.1M instead of the default 100M, this is the primary change that saved the space for @tdudgeon

@pixdrift
Copy link
Collaborator

pixdrift commented Mar 27, 2018

Resulting OL7 package list from my posted buildah script above (image size: 56.4MB)

(1/40): basesystem-10.0-7.0.1.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.0.1.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7.x86_64.rpm
(9/40): glibc-common-2.17-196.el7.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.0.1.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.0.1.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.0.1.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2017b-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm

The same when using RHEL 7.4 (image size: 57.08 MB):

(1/40): basesystem-10.0-7.el7.noarch.rpm
(2/40): bash-4.2.46-29.el7_4.x86_64.rpm
(3/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(4/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(5/40): coreutils-8.22-18.el7.x86_64.rpm
(6/40): filesystem-3.2-21.el7.x86_64.rpm
(7/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(8/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(9/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm
(10/40): gmp-6.0.0-15.el7.x86_64.rpm
(11/40): grep-2.20-3.el7.x86_64.rpm
(12/40): info-5.1-4.el7.x86_64.rpm
(13/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(14/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(15/40): libacl-2.2.51-12.el7.x86_64.rpm
(16/40): libattr-2.4.46-12.el7.x86_64.rpm
(17/40): libcap-2.22-9.el7.x86_64.rpm
(18/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(19/40): libffi-3.0.13-18.el7.x86_64.rpm
(20/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(21/40): libselinux-2.5-11.el7.x86_64.rpm
(22/40): libsepol-2.5-6.el7.x86_64.rpm
(23/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(24/40): libtasn1-4.10-1.el7.x86_64.rpm
(25/40): libverto-0.2.5-4.el7.x86_64.rpm
(26/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(27/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(28/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(29/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(30/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(31/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(32/40): p11-kit-0.23.5-3.el7.x86_64.rpm
(33/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): popt-1.13-16.el7.x86_64.rpm
(36/40): redhat-release-server-7.4-18.el7.x86_64.rpm
(37/40): sed-4.2.2-5.el7.x86_64.rpm
(38/40): setup-2.8.71-7.el7.noarch.rpm
(39/40): tzdata-2018d-1.el7.noarch.rpm
(40/40): zlib-1.2.7-17.el7.x86_64.rpm

Interested to know why CentOS 7 is larger using the same process.

@ipbabble
Copy link
Contributor

ipbabble commented Mar 27, 2018 via email

@pixdrift
Copy link
Collaborator

Something strange is definitely happening. I just ran the same script as above on a fresh CentOS 7.4 build.. and I also got a 91.57MB result, which is 40MB bigger than either RHEL 7 or OL 7. The package list is the same number (40).. so something is odd. Going to go through and compare images now. Here is the package list from the CentOS 7.4 image.

(1/40): basesystem-10.0-7.el7.centos.noarch.rpm
(2/40): centos-release-7-4.1708.el7.centos.x86_64.rpm
(3/40): bash-4.2.46-29.el7_4.x86_64.rpm
(4/40): filesystem-3.2-21.el7.x86_64.rpm
(5/40): gawk-4.0.2-4.el7_3.1.x86_64.rpm
(6/40): chkconfig-1.7.4-1.el7.x86_64.rpm
(7/40): ca-certificates-2017.2.14-71.el7.noarch.rpm
(8/40): grep-2.20-3.el7.x86_64.rpm
(9/40): info-5.1-4.el7.x86_64.rpm
(10/40): keyutils-libs-1.5.8-3.el7.x86_64.rpm
(11/40): gmp-6.0.0-15.el7.x86_64.rpm
(12/40): libacl-2.2.51-12.el7.x86_64.rpm
(13/40): libattr-2.4.46-12.el7.x86_64.rpm
(14/40): libcap-2.22-9.el7.x86_64.rpm
(15/40): libcom_err-1.42.9-10.el7.x86_64.rpm
(16/40): libffi-3.0.13-18.el7.x86_64.rpm
(17/40): krb5-libs-1.15.1-8.el7.x86_64.rpm
(18/40): libselinux-2.5-11.el7.x86_64.rpm
(19/40): coreutils-8.22-18.el7.x86_64.rpm
(20/40): libsepol-2.5-6.el7.x86_64.rpm
(21/40): libgcc-4.8.5-16.el7_4.2.x86_64.rpm
(22/40): libverto-0.2.5-4.el7.x86_64.rpm
(23/40): libtasn1-4.10-1.el7.x86_64.rpm
(24/40): libstdc++-4.8.5-16.el7_4.2.x86_64.rpm
(25/40): ncurses-5.9-14.20130511.el7_4.x86_64.rpm
(26/40): nspr-4.13.1-1.0.el7_3.x86_64.rpm
(27/40): ncurses-libs-5.9-14.20130511.el7_4.x86_64.rpm
(28/40): nss-softokn-freebl-3.28.3-8.el7_4.x86_64.rpm
(29/40): ncurses-base-5.9-14.20130511.el7_4.noarch.rpm
(30/40): p11-kit-0.23.5-3.el7.x86_64.rpm 
(31/40): p11-kit-trust-0.23.5-3.el7.x86_64.rpm
(32/40): popt-1.13-16.el7.x86_64.rpm
(33/40): sed-4.2.2-5.el7.x86_64.rpm
(34/40): pcre-8.32-17.el7.x86_64.rpm
(35/40): setup-2.8.71-7.el7.noarch.rpm
(36/40): zlib-1.2.7-17.el7.x86_64.rpm
(37/40): tzdata-2018d-1.el7.noarch.rpm
(38/40): glibc-2.17-196.el7_4.2.x86_64.rpm
(39/40): openssl-libs-1.0.2k-8.el7.x86_64.rpm
(40/40): glibc-common-2.17-196.el7_4.2.x86_64.rpm

@pixdrift
Copy link
Collaborator

Problem in CentOS was yum cache data not being cleaned up correctly. In my case it was epel repo. This could be solved by using '--disablerepo=epel' to the yum command, but people may want to install packages from here as part of the image creation.

I have an updated script here which uses rm to clean up the yum cache, and it brings the CentOS 7 image down below 57MB.
https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh

@TomSweeneyRedHat
Copy link
Member

@pixdrift @tdudgeon FYI, I just posted a little blog on this issue at http://www.projectatomic.io/blog/2018/04/open-source-what-a-concept/. Thanks a bunch for inspiring it and for your contributions here!

nalind pushed a commit that referenced this issue Apr 2, 2018
Cull funcs from runtime_img.go which are no longer needed.  Also, fix any remaining
spots that use the old image technique.

Signed-off-by: baude <bbaude@redhat.com>

Closes: #532
Approved by: mheon
@pixdrift
Copy link
Collaborator

pixdrift commented Apr 2, 2018

@TomSweeneyRedHat, thanks for posting the article. It should be noted that I identified two further things in this thread that are worth mentioning in the blog post.

  1. The ~40MB that is keeping the image at 92MB is the yum cache for epel. If the method of cache cleanup is changed to rm (rather than yum clean in the posted script), the image size reliably comes back at around 57MB for OL7, RHEL7 and CentOS 7.
  2. The option --setopt=install_weak_deps=false doesn't reduce the size of this image in my testing, so I have removed it to avoid unnecessary complexity.

Updated script is here:
https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.minimal.sh

@rhatdan
Copy link
Member

rhatdan commented Apr 2, 2018

@pixdrift I think the --setop=install_weak_deps might have some effect on a Fedora system. I don't believe RHEL or CENTOS Support week dependencies.

@pixdrift
Copy link
Collaborator

pixdrift commented Apr 2, 2018

Thanks @rhatdan, that helps explain it. I would expect a Fedora script to do the same (create a minimal base image) would be using dnf at this point in time, so the option may no longer be relevant there either.. unless the setopt option remained the same. I don't spend much time out of EL, but will take a look for interest's sake.

@tdudgeon
Copy link
Author

tdudgeon commented Apr 3, 2018

Just for completeness, I tried rebuilding the base centos7 image but with the extra rm -rf $scratchmnt/var/cache/yum command suggested by @pixfrift to clean up the cache and the image size drops from 91.6 MB to 57.15 MB. Not bad seeing as we started at 212 MB!

Thanks all!

@gbraad
Copy link
Member

gbraad commented Apr 3, 2018

Maybe an idea to use https://github.com/GoogleCloudPlatform/container-diff as it recently got support for RPM, but it can help with comparing containers even just on file level.

@TomSweeneyRedHat
Copy link
Member

@pixdrift yep, noted the additional input from you. I didn't want to add it to the blog post as I've found a blog length of about 4 pages in a word processor software is about as long as you want. So I tried to show the initial breadcrumbs in the blog and then gave a couple of pointers and a tease to this issue here so they could dive even deeper. I do very much appreciate your contributions here though, it's been some really great work.

@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2018

I might take a stab at a blog on this from a security point of view.

@pixdrift
Copy link
Collaborator

pixdrift commented Apr 4, 2018

Would there be value capturing some of these buildah scripts for base OS container builds in contrib?

@rhatdan
Copy link
Member

rhatdan commented Apr 4, 2018

Sure maybe an examples directory.

@TomSweeneyRedHat
Copy link
Member

@pixdrift I was thinking about that, didn't know if it made sense there, examples and/or tutorials. But I definitely wanted to save at least the final result somewhere after the dust settled.

@pixdrift
Copy link
Collaborator

pixdrift commented Apr 6, 2018

The following may also be interesting to people following this thread:
https://gitlab.com/pixdrift/buildah-scripts/blob/master/el7.ansible.minimal.sh

This example uses pip from the host (python2-pip) to install Python packages into the container, so the container doesn't need pip and its dependencies installed or any compilers installed to build source packages from pypi.

In this example the pip module is ansible (which could have just as easily been installed as an rpm), but it's really to demonstrate building containers to run Python code as it is a use case I see repeated in EL7 environments. In this case I am developing an apb-base style image using buildah to run Ansible playbooks.

The resulting image in this case which includes python + Ansible 2.5.0 and all required dependencies is around 150MB. Leaving pip outside the container looks to save around 10MB (depending on method).. and more if compilers etc. are required.

Still determining if there is any impact to the pip installation on the host, but looks good so far.

@rhatdan
Copy link
Member

rhatdan commented Apr 6, 2018

@pixdrift Want to write a blog describing this?

@rhatdan
Copy link
Member

rhatdan commented May 25, 2018

Blogs have been written explaining this.

@rhatdan rhatdan closed this as completed May 25, 2018
@tdudgeon
Copy link
Author

Just for the record I finally got round to writing this up as a blog post:
https://www.informaticsmatters.com/blog/2018/05/31/smaller-containers-part-3.html
@pixdrift @rhatdan @TomSweeneyRedHat and others - thanks for your help!

@TomSweeneyRedHat
Copy link
Member

Excellent news @tdudgeon , thanks for sharing!

@pixdrift
Copy link
Collaborator

I should probably (finally) mention I did do a write up (months ago) that included the same process for EL8, with comparisons to the Red Hat UBI images. In case someone is stumbling across this thread and looking for additional content on the subject, I posted it here, with the README.md describing the outcomes:

https://gitlab.com/pixdrift/buildah-scripts

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants