Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with linux kernel 4.8 #28705

Closed
lmazardo opened this issue Nov 22, 2016 · 15 comments

Comments

@lmazardo
Copy link

commented Nov 22, 2016


BUG REPORT INFORMATION

Description
When I launch bash on a docker with an image debian:wheezy and linux-kernel 4.8, it fails.
All is ok with linux-kernel 4.7.

docker run -it debian:wheezy bash
vagrant@debian-testing:~$ echo $?
139

Steps to reproduce the issue:

  1. vagrant up # install a debian testing with linux 4.8 - I've upload Vagrantfile.txt and bootstrap.sh.txt for setup vagrant box
  2. vagrant ssh # entering vagrant box with running kernel 4.7
    a. docker run -it debian:wheezy bash # all is ok actually linux kernel 4.7
    b. sudo reboot
  3. vagrant ssh # entering vagrant box with running kernel 4.8
    a. docker run -it debian:wheezy bash
    b. echo $?
    139

Additional information you deem important (e.g. issue happens only occasionally):
bootstrap.sh.txt
Vagrantfile.txt

Output of docker version:

Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:45:16 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:        Wed Oct 26 21:45:16 2016
 OS/Arch:      linux/amd64

Output of docker info:

ontainers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 1
Server Version: 1.12.3
Storage Driver: devicemapper
 Pool Name: docker-8:1-262977-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 352.4 MB
 Data Space Total: 107.4 GB
 Data Space Available: 7.905 GB
 Metadata Space Used: 860.2 kB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.147 GB
 Thin Pool Minimum Free Space: 10.74 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.133 (2016-08-15)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay host null bridge
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.8.0-1-amd64
Operating System: Debian GNU/Linux stretch/sid
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 492.3 MiB
Name: debian-testing
ID: BVRA:DBPA:GCAW:Z6LO:BIEE:I64Q:DFSB:53O2:VQFA:OVCH:CZOB:T2PX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
vagrant with fujimakishouten/debian-stretch64 image

@justincormack

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

Hmm, I am running with upstream kernel 4.8.10 with no issues at all, so it looks like it might be an issue with that Debian kernel. Is it the latest one?

Does it make a difference if you run with --privileged?

@lmazardo

This comment has been minimized.

Copy link
Author

commented Nov 22, 2016

Here is the output of uname -a

Linux debian-testing 4.8.0-1-amd64 #1 SMP Debian 4.8.5-1 (2016-10-28) x86_64 GNU/Linux

There is no difference with --privileged.

@justincormack

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

ping @ijc25

@ijc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

Something similar was reported to Debian in Debian #845085 which also points to a forum post and tianon/docker-brew-debian#55 (/cc @tianon).

Comparing my local /boot/config-4.7.0-1-amd64 and /boot/config-4.8.0-1-amd64 (I'm still running 4.7, haven't had a chance to reboot yet) the most interesting thing I see is:

 # CONFIG_LEGACY_VSYSCALL_NATIVE is not set
-CONFIG_LEGACY_VSYSCALL_EMULATE=y
-# CONFIG_LEGACY_VSYSCALL_NONE is not set
+# CONFIG_LEGACY_VSYSCALL_EMULATE is not set
+CONFIG_LEGACY_VSYSCALL_NONE=y

Those are described in linux/arch/x86/Kconfig

In particular:

This setting can be changed at boot time via the kernel command
line parameter vsyscall=[native|emulate|none].

So it would be worth trying booting with each of vsyscall=emulate anhd vsyscall=native (in two independent tests).

@ijc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

https://anonscm.debian.org/cgit/kernel/linux.git/commit/?id=2aced7818ac46ca050ee68255ca20eeb14432a95 made this change saying:

+linux (4.8~rc8-1~exp2) UNRELEASED; urgency=medium
+
+  * [amd64] Enable LEGACY_VSYSCALL_NONE instead of LEGACY_VSYSCALL_EMULATE.
+    This breaks (e)glibc 2.13 and earlier, and can be reverted using the kernel
+    parameter: vsyscall=emulate

@lmazardo could you try that please.

@ijc ijc referenced this issue Nov 22, 2016
@lmazardo

This comment has been minimized.

Copy link
Author

commented Nov 22, 2016

On linux kernel 4.8.5-1 and with parameter vsyscall=emulate, it works.

grep GRUB_CMDLINE_LINUX_DEFAULT /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet vsyscall=emulate"
@justincormack

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

@ijc25 what was the rationale for this change? It seems pretty aggressive at not letting older code work, which is unusual for Linux... Is there a security issue?

@ijc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

AIUI (mainly based on the Kconfig help) it's a security "related" thing because the old setting involves some non-ASLR code in every process address space (vsyscall used to be at a fixed address), so disabling it improves things by getting rid of that.

Older (e)glibc (<=2.13 according to the Debian kernel changelog) is not compatible since it doesn't know about the new dynamic vsyscall address mechanisms and only knows the static one. Looks like CentOS 6 and Debian Wheezy both have old enough libc to be affected.

Since Wheezy is now oldstable I suppose that was deemed a reasonable cut off point, especially since there is a command line escape hatch. I wasn't involved/paying attention when this change was made though, so I don't know what the probability of deferring the change for another Debian release would be.

@tianon

This comment has been minimized.

Copy link
Member

commented Nov 22, 2016

@ijc25 I need a way to react to a GitHub comment with more than one heart -- thanks so much for chasing this down and dropping info about it in all the places I've seen it reported before I was even awake! 😄 ❤️ ❤️

IMO, trying to convince Ben to delay this change until stretch+1 is just delaying the inevitable -- I think our efforts would probably be better spent documenting this change and how to override the behavior. 😅

@justincormack

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

Should we add a warning to the check-config script?

@tianon

This comment has been minimized.

Copy link
Member

commented Nov 22, 2016

@ijc

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2016

I'm unsure what the relationship with CONFIG_X86_VSYSCALL_EMULATION (a bool option, gated on CONFIG_EXPERT and enabled in all the Debian config's I looked at including stable, testing and unstable and some other ones in the middle I had lying around) is.

It seems to have no Kconfig based relationship with CONFIG_LEGACY_VSYSCALL_*, but I would have thought it would at least make CONFIG_LEGACY_VSYSCALL_EMULATED unavailable. I think it probably doesn't matter from a check-config PoV (or at the very least could be considered separately).

@amluto or @kees would one of you be able to shed some light on the relationship here perhaps?

vieux added a commit to vieux/docker that referenced this issue Nov 28, 2016

Check for LEGACY_VSYSCALL_* options
Chosing LEGACY_VSYSCALL_NONE (over NATIVE or EMULATE) will mean that binaries
using eglibc <= 2.13 will not run (segfault).

Fixes moby#28705.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
(cherry picked from commit 163db04)
Signed-off-by: Victor Vieux <victorvieux@gmail.com>
@kees

This comment has been minimized.

Copy link

commented Dec 2, 2016

CONFIG_LEGACY_VSYSCALL_NATIVE should be considered a dangerous setting: it provides an ASLR-bypassing target with usable ROP gadgets.

CONFIG_LEGACY_VSYSCALL_NONE is the safest, but it sounds like you have to deal with pre-2.13 glibcs. In that case, the remaining option is fine:

CONFIG_LEGACY_VSYSCALL_EMULATED contains some risk for ASLR-bypassing, even just for having a known-good place to read a known-value from memory.

I would strongly recommend that CONFIG_LEGACY_VSYSCALL_NONE be used and to boot systems that require emulation with "vsyscall=emulate"

xianlubird pushed a commit to xianlubird/docker that referenced this issue Dec 23, 2016

Check for LEGACY_VSYSCALL_* options
Chosing LEGACY_VSYSCALL_NONE (over NATIVE or EMULATE) will mean that binaries
using eglibc <= 2.13 will not run (segfault).

Fixes moby#28705.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
@jumbojett

This comment has been minimized.

Copy link

commented Dec 22, 2017

Here's how I fixed Alpine. I hope this helps anyone else struggling with this issue.

Edit /boot/grub/grub.cfg. Add vsyscall=emulate at the end of the first menuentry. Then reboot.

Example:

set timeout=2
insmod all_video
menuentry "Alpine Linux" {
        linux /boot/vmlinuz-hardened ...vsyscall=emulate
        ...                                  👆👆👆👆
}

xtreme-stevehiehn added a commit to cloudfoundry-incubator/cfdev that referenced this issue Jan 15, 2018

Use bosh-dns which allows us to remove consul
Given that our VM is using a 4.9.x kernel we need to have vsyscall=emulate
as a kernel cmdline argument. Otherwise bosh-dns will segfault because it is
compiled with an older glibc which cannot handle dynamic vsyscall addressing

Another mitigation would have been to bump glibc to versions > 2.13

See: moby/moby#28705 (comment)

Signed-off-by: Dave Protasowski <dprotaso@gmail.com>

xtreme-stevehiehn added a commit to cloudfoundry-incubator/cfdev that referenced this issue Jan 15, 2018

Use bosh-dns which allows us to remove consul
Given that our VM is using a 4.9.x kernel we need to have vsyscall=emulate
as a kernel cmdline argument. Otherwise bosh-dns will segfault because it is
compiled with an older glibc which cannot handle dynamic vsyscall addressing

Another mitigation would have been to bump glibc to versions > 2.13

See: moby/moby#28705 (comment)

Signed-off-by: Steve Hiehn <shiehn@pivotal.io>
@albertvaka

This comment has been minimized.

Copy link

commented Jul 10, 2019

Note that on GRUB2 the grub.cfg file is meant to be generated automatically by the update-grub scripts so your changes will be overwritten if/when these run.

Instead, you should edit /etc/default/grub and add the option vsyscall=emulate to the end of GRUB_CMDLINE_LINUX_DEFAULT. It should look something like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet vsyscall=emulate"

The run sudo update-grub and reboot your computer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.