Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dockerfile to use alpine instead of debian #77

Closed
wants to merge 3 commits into from
Closed

Change dockerfile to use alpine instead of debian #77

wants to merge 3 commits into from

Conversation

munntjlx
Copy link
Contributor

I mostly did this for fun, but I prefer alpine images as they tend to be smaller.

This updates the docker file to use alpine.
Update Dockerfile - Alpine Base
@bradlarsen
Copy link
Collaborator

Thanks for the pull request, @munntjlx! I will take a look at this soon.

@bradlarsen
Copy link
Collaborator

bradlarsen commented Sep 25, 2023

Thank you for the PR, @munntjlx!

I just took a look at this, and the alpine-based image does end up significantly smaller than the debian-based one: 57MB vs 216MB, nearly a 4x reduction.

Frustratingly, the scan performance of the alpine-based image is several times slower! Using cpython as a casual benchmark (30GB of blobs), I see this alpine-based Docker image scan in about 100 seconds, whereas the debian-based Docker image takes about 30 seconds — a 3x performance difference! I re-ran this a handful of times to be sure.

I suspect the performance different is somehow due to using musl instead of glibc? Some quick searching online indicates that it may be due to a malloc implementation in alpine having much more contention in multithreaded settings than glibc: https://www.linkedin.com/pulse/testing-alternative-c-memory-allocators-pt-2-musl-mystery-gomes/. This sounds plausible, as Nosey Parker does make heavy use of thread-based parallelism.

Interesting to see that Nosey Parker can build with musl! That may be relevant for one day producing statically-linked binaries (which is trickier than one would like here, due to a number of native-code dependencies).

I would take this PR if the performance did not drop significantly, as the images are many times smaller. But at present, I value the Nosey Parker scan throughput over container size, so won't be merging this back unless the performance can be addressed.

It may be possible to sidestep the performance drop by switching Nosey Parker to use an alternative allocator such as mimalloc instead. (Nearly all the allocation that Nosey Parker does is in Rust code, not C++.) But this would be a larger-scale investigation.

@munntjlx
Copy link
Contributor Author

I am sad that the performance was so much slower. You CAN build glibc things in alpine, you just have to add the right packages, which most people sometimes forget (theres been a package for glibc stuff for AGES in alpine). I mostly did it for the fun, to see how much smaller the image is. I will probably switch to your default, just that the scans will be so much sadder.

@munntjlx
Copy link
Contributor Author

Which OS did you test in? AS the 'host' os that is?

@bradlarsen
Copy link
Collaborator

@munntjlx this was running on an x86_64 macOS machine. I'll also try on a big Linux machine.

@bradlarsen
Copy link
Collaborator

@munntjlx switching the global allocator in Nosey Parker to use mimalloc does seem to restore the performance for me in the Alpine Docker image (#81). I'm going to do more performance validation with that in the near future, and if that is indeed a win overall, will then merge this PR. It will take a little while, but keep this PR open :)

@munntjlx
Copy link
Contributor Author

The main reason I tend to prefer alpine is the smaller base, it does have a bit of strangeness (python for example) but once you get used to working with musl (I am an old hand at openwrt), I find that most things work (with a few exceptions). Thanks for being willing to entertain a MUSL build! The other 'old' complaint among the k8s folks was the lack of support for dns tcp (which has been fixed for about 6 months now), in that MUSL now directly supports TCP based dns requests.

@bradlarsen
Copy link
Collaborator

FYI I see similar performance characteristics from the Alpine image on a 32-core Ubuntu 22.04 machine -- a 6x slowdown relative to the glibc-based Docker image (~500MB/s vs ~3GB/s scan throughput).

Similarly, switching the global allocator to mimalloc mostly fixes the slowdown there: (from ~500MB/s up to 2.6GB/s scan throughput within Docker containers).

Let me get the switchover to mimalloc in, and then get back to this PR.

@munntjlx
Copy link
Contributor Author

Now that #88 is done can we have a 'separate' dockerfile for alpine?

@bradlarsen
Copy link
Collaborator

@munntjlx Yes, a separate Dockerfile.alpine is my thought. Give me a bit and I'll take care of it.

bradlarsen added a commit that referenced this pull request Oct 16, 2023
This was adapted from PR #77.

Co-authored-by: Thomas Munn <48925191+munntjlx@users.noreply.github.com>
@bradlarsen
Copy link
Collaborator

@munntjlx I added the new Alpine-based Dockerfile as Dockerfile.alpine in 6bf13dc, now on main.

I'm going to also update the GitHub Actions CI to automatically build both flavors of Dockerfile.

@bradlarsen bradlarsen closed this Oct 16, 2023
@bradlarsen
Copy link
Collaborator

P.S. @munntjlx I did mark you as a co-author on that commit, so you should get credit for it.

@munntjlx
Copy link
Contributor Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants