Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enormous size of diffs #3110

Closed
iemejia opened this issue Dec 7, 2013 · 14 comments
Closed

Enormous size of diffs #3110

iemejia opened this issue Dec 7, 2013 · 14 comments

Comments

@iemejia
Copy link

iemejia commented Dec 7, 2013

I'm a new user of docker and to make my first tests I decided to run
the vagrant ubuntu 12.04 image from your repo (git clone + vagrant
up), which installed docker 0.7.

Once I did this I created a basic Dockerfile that just pulls a ubuntu
12.04 container, changes the deb sources and update the packages. From
this first image (iemejia/test). The size different of my image and
the base image is already big (144.5MB) which I don't understand since
I don't think it downloaded so many things in the updates, but maybe.

However whenn I run bash in the new image (iemejia/test) and then
'apt-get install tree' to install the tree package that it's mere
28.4k in size (compressed) the size of the new image (iemejia/test2)
is ridiculously bigger, it is 33.93 MB, and there's no way that the
changes were this big. There's an error there ? or I'm I doing it
wrong, what else is saved in each docker hash that makes it sooo big ?

I add here the log of my test and I just uploaded the images in
the index.docker.io in case some of you want to check).

Thanks.

@iemejia
Copy link
Author

iemejia commented Dec 7, 2013

vagrant@precise64:~/dockerfiles/test$ cat Dockerfile
# docker-version        0.7
FROM        ubuntu:12.04

# Build dependencies
# Update remote package metadata
RUN echo 'deb http://archive.ubuntu.com/ubuntu precise main universe' > /etc/apt/sources.list
RUN apt-get update -q

vagrant@precise64:~/dockerfiles/test$ docker build -t iemejia/test .
Uploading context 10240 bytes
Step 1 : FROM ubuntu:12.04
 ---> 8dbd9e392a96
Step 2 : RUN echo 'deb http://archive.ubuntu.com/ubuntu precise main universe' > /etc/apt/sources.list
 ---> Running in fde008377f14
 ---> 9c9189d04566
Step 3 : RUN apt-get update -q
 ---> Running in 67765a12e83b
Ign http://archive.ubuntu.com precise InRelease
Hit http://archive.ubuntu.com precise Release.gpg
Hit http://archive.ubuntu.com precise Release
Hit http://archive.ubuntu.com precise/main amd64 Packages
Get:1 http://archive.ubuntu.com precise/universe amd64 Packages [6167 kB]
Get:2 http://archive.ubuntu.com precise/main i386 Packages [1641 kB]
Get:3 http://archive.ubuntu.com precise/universe i386 Packages [6180 kB]
Get:4 http://archive.ubuntu.com precise/main TranslationIndex [3706 B]
Get:5 http://archive.ubuntu.com precise/universe TranslationIndex [2922 B]
Get:6 http://archive.ubuntu.com precise/main Translation-en [893 kB]
Get:7 http://archive.ubuntu.com precise/universe Translation-en [4133 kB]
Fetched 19.0 MB in 20s (907 kB/s)
Reading package lists...
 ---> 540c708ca769
Successfully built 540c708ca769

vagrant@precise64:~/dockerfiles/test$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
iemejia/test           latest              540c708ca769        57 seconds ago      144.5 MB (virtual 272.5 MB)
ubuntu                 12.04               8dbd9e392a96        7 months ago        128 MB (virtual 128 MB)

vagrant@precise64:~/dockerfiles/test$ docker run -i -t iemejia/test /bin/bash
root@f41499564819:/# apt-get install tree
Reading package lists... Done
Building dependency tree... Done
The following NEW packages will be installed:
  tree
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.4 kB of archives.
After this operation, 102 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu/ precise/universe tree amd64 1.5.3-2 [28.4 kB]
Fetched 28.4 kB in 0s (115 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package tree.
(Reading database ... 7545 files and directories currently installed.)
Unpacking tree (from .../tree_1.5.3-2_amd64.deb) ...
Setting up tree (1.5.3-2) ...
root@f41499564819:/# exit
exit

vagrant@precise64:~/dockerfiles/test$ docker commit f41499564819 iemejia/test2
b80532effd605458425031e56a90aa12160d3450602ff35c04ec25980abf0e96
vagrant@precise64:~/dockerfiles/test$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
iemejia/test2          latest              b80532effd60        4 seconds ago       33.93 MB (virtual 306.4 MB)
iemejia/test           latest              540c708ca769        3 minutes ago       144.5 MB (virtual 272.5 MB)
ubuntu                 12.04               8dbd9e392a96        7 months ago        128 MB (virtual 128 MB)

@crosbymichael
Copy link
Contributor

@iemejia adding another apt repo and doing update pulls down a lot of information. And apt get update with a new repo can be anywhere between 30-60mb or more.

@iemejia
Copy link
Author

iemejia commented Dec 7, 2013

Yes I agree, but what about the difference between the test and test2 images where the only actions I executed were bash to enter and the apt-get install of the package tree ? Is that 35 MB, isn't it too much ?

@unclejack
Copy link
Contributor

@iemejia If you run docker diff on a container, you can find out exactly what files were created and modified. A 33 MB layer isn't much for an apt-get install operation. Apt creates some other files and it changes some internal databases as well when installing a package.

@iemejia
Copy link
Author

iemejia commented Dec 8, 2013

Well it's true that if it changes the APT database and you save the whole thing the change can be big, but still I'm negatively impressed of the size of the diff. I'm just curious about what docker saves in their diffs, is there any doc about it ?
Have you thought about a way to optimize this, like a post 'gc' or something like that (I don't know if it's a filesystem issue too btw). I mean, docker is ultra nice, but the idea of adding a 28.4KB deb file and getting a 35 MB diff is kinda shocking (since that change in my normal linux takes in the end 102KB extra space).
Maybe packing deltas like git does would be a possible solution.
Ref. http://git-scm.com/book/en/Git-Internals-Packfiles

@crosbymichael
Copy link
Contributor

@iemejia Docker does not save anything in the diff.

You see a bigger increase with docker because the base images start with NOTHING and a simple apt-get can pull down a lot of data the first times

@unclejack
Copy link
Contributor

As @crosbymichael has stated above, layers aren't like git diffs. Layers contain the complete changed files, not just the small bits which changed.

I've just run apt-get update in a fresh container and I also have a 28MB container. It was larger in your case. It's that big because you've enabled universe and some of those lists were retrieved again.

This isn't really a Docker problem. I'll close this issue now. Please feel free to ask questions on the docker-user Google group and in the #docker channel on freenode.

@iemejia
Copy link
Author

iemejia commented Dec 8, 2013

@crosbymichael As I said before I agree with your argument, I know APT downloads a lot of things, but you are ignoring the second part of my question (which was in fact the actual reason I created this issue). My point is that the fact of installing a new package that produces extra 102kb in a normal filesystem ends up producing a 34MB diff layer in docker. This is far from optimal (the layer size 30000% bigger than the extra bytes in the filesystem). And I can imagine that you notice the importance of this since those layers are the ones that end up being pushed to the docker index, and reducing them can be an important improvement both for users (using less disk) as for you dotcloud (less bandwidth+disk).

@unclejack AFAIK git also saves the complete version of each revision of the files and assigns a hash to them, that's why I thought that a similar approach to the one that git uses when you execute gc (to create a 'smart' packfile+deltas) could be used to reduce the size of docker layers, maybe another alternative could be to use a more advanced filesystem that does this internally (i don't know).

Anyway I understand that you close this issue since it doesn't break any functionality and is a consequence of a design decision, and then probably it would be better to discuss it in the mailing list, but I expect that you see the insight of the importance of improving the size of docker layers.

@crosbymichael
Copy link
Contributor

@iemejia Yes your right. It has to do with not being able to do a binary diff on /var/apt/cache/pkgcache.bin I'm not sure of a work around for this.

@crosbymichael
Copy link
Contributor

We have also considered the git pack format for diffing layers. I think this is a good reason to take another look.

@iemejia
Copy link
Author

iemejia commented Dec 9, 2013

As a follow-up and just inspired by the tip in this article I disabled the apt cache.
https://wiki.ubuntu.com/ReducingDiskFootprint

I added this line in my Vagrantfile just after the deb definition

RUN echo 'Dir::Cache { srcpkgcache ""; pkgcache ""; }' > /etc/apt/apt.conf.d/02nocache

The results are more coherent with what I expected, thanks for your help. I hope this tip is useful for you too.

vagrant@precise64:~/test$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
iemejia/test2       latest              4c304bfd9730        7 minutes ago       206.6 MB
iemejia/test        latest              cb3cb83b0ec8        8 minutes ago       205.9 MB
ubuntu              12.04               8dbd9e392a96        8 months ago        128 MB

👍 -

@iemejia
Copy link
Author

iemejia commented Dec 9, 2013

btw it's kinda sad that the new docker doesn't show the size of the diff as before. I probably wouldn't have realized about this issue without that.

@crosbymichael
Copy link
Contributor

@iemejia Run docker history <image> to get detailed information on layer size

@iemejia
Copy link
Author

iemejia commented Dec 9, 2013

Another interesting idea is to compress the apt lists (/var/lib/apt/lists/) with this line:

RUN echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/02compress-indexes

But well this is probably a more useful implementation for the .box and the Vagrantfile since it ends being almost 170MB of 'wasted space' for the initial version of the lists in the VM.

Thanks for the marker on the history command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants