New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/proc/1/environ is unavailable in a container that is not priviledged #6607

Closed
damm opened this Issue Jun 23, 2014 · 39 comments

Comments

Projects
None yet
@damm

damm commented Jun 23, 2014

Hi,

@tianon originally suggested me to read this awhile back; I noticed as of 1.0.1 that I cannot access this file unless the container is privileged.

Anyway we can restore this access or point me to a different way to get the environment variables? (In this case the container is init and upstart calls a upstart job that needs to parse this file to figure out it's linking)

Thanks!

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 2, 2014

Contributor

I agree this is not good, but in this case... assuming you are in pid 1, /proc/self/environ does work.

Contributor

cpuguy83 commented Jul 2, 2014

I agree this is not good, but in this case... assuming you are in pid 1, /proc/self/environ does work.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 2, 2014

@cpuguy83

I've been working hard basically to reduce the amount of containers that require privileged mode. That said I really don't want to push more containers into privileged just because of this.

So is there a better suggestion @tianon or @cpuguy83 that I can use to grab the environment variables of the container after it has left the initial process?

damm commented Jul 2, 2014

@cpuguy83

I've been working hard basically to reduce the amount of containers that require privileged mode. That said I really don't want to push more containers into privileged just because of this.

So is there a better suggestion @tianon or @cpuguy83 that I can use to grab the environment variables of the container after it has left the initial process?

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Jul 3, 2014

Contributor

If the final process is a child of the initial process wouldn't it have inherited the environ?
/proc/self/environ should still work in that case... I'm just guessing here, though.

Contributor

cpuguy83 commented Jul 3, 2014

If the final process is a child of the initial process wouldn't it have inherited the environ?
/proc/self/environ should still work in that case... I'm just guessing here, though.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 3, 2014

Contributor

This is a duplicate of #5936

Contributor

ibuildthecloud commented Jul 3, 2014

This is a duplicate of #5936

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 3, 2014

@ibuildthecloud this is a recent breakage. It was working on 0.10.1 and 0.11 but it sounds similar.

There is nothing in dmesg to suggest apparmor is involved here.

@cpuguy83 I link several containers and drop configuration settings in the environment variables. From Upstart -> Init job running they get lost.

I would suspect it's upstart loosing them as otherwise all tasks you would think would inherit right?

So when you call /proc/self/environ in the upstart job your not getting them. Sometimes you loose them by running supervisord and forking again and you just need them.

damm commented Jul 3, 2014

@ibuildthecloud this is a recent breakage. It was working on 0.10.1 and 0.11 but it sounds similar.

There is nothing in dmesg to suggest apparmor is involved here.

@cpuguy83 I link several containers and drop configuration settings in the environment variables. From Upstart -> Init job running they get lost.

I would suspect it's upstart loosing them as otherwise all tasks you would think would inherit right?

So when you call /proc/self/environ in the upstart job your not getting them. Sometimes you loose them by running supervisord and forking again and you just need them.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 3, 2014

Contributor

@damm yes I know that bug was reported against master at the time which was pre-0.12. 0.12 and 1.0 introduced a lot of changes from the security perspective in terms of apparmor, capabilities, locking down sys and proc. As I mentioned in that bug I was trying to narrow it down to what was restricting access. I thought it was apparmor but found that to be wrong.

In the end I couldn't find anyone, on IRC at the time, that could point to why it stopped working.

Contributor

ibuildthecloud commented Jul 3, 2014

@damm yes I know that bug was reported against master at the time which was pre-0.12. 0.12 and 1.0 introduced a lot of changes from the security perspective in terms of apparmor, capabilities, locking down sys and proc. As I mentioned in that bug I was trying to narrow it down to what was restricting access. I thought it was apparmor but found that to be wrong.

In the end I couldn't find anyone, on IRC at the time, that could point to why it stopped working.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 3, 2014

@ibuildthecloud yeah I don't have apparmor on Gentoo and I can reproduce it. I believe it's just the reaction to locking everything down.

https://github.com/docker/libcontainer/tree/master/security might be good

damm commented Jul 3, 2014

@ibuildthecloud yeah I don't have apparmor on Gentoo and I can reproduce it. I believe it's just the reaction to locking everything down.

https://github.com/docker/libcontainer/tree/master/security might be good

@crosbymichael crosbymichael added the bug label Jul 7, 2014

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 7, 2014

Contributor

Im guessing this is a specific cap issue, i'll dig

Contributor

crosbymichael commented Jul 7, 2014

Im guessing this is a specific cap issue, i'll dig

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 11, 2014

Contributor

It looks like this is due to not keeping CAP_SYS_PTRACE

Contributor

crosbymichael commented Jul 11, 2014

It looks like this is due to not keeping CAP_SYS_PTRACE

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 11, 2014

Contributor

@rhatdan @vmarmol can either of you think why we would not want to allow this cap?

Contributor

crosbymichael commented Jul 11, 2014

@rhatdan @vmarmol can either of you think why we would not want to allow this cap?

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm commented Jul 11, 2014

Relevent LWN article http://lwn.net/Articles/393010/

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Jul 11, 2014

Member

It sounds to me like the pid namespace helps here.

Member

tianon commented Jul 11, 2014

It sounds to me like the pid namespace helps here.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Jul 11, 2014

Contributor

Well shoot, I can't believe I didn't notice this, but strace doesn't work either due do this.

Contributor

ibuildthecloud commented Jul 11, 2014

Well shoot, I can't believe I didn't notice this, but strace doesn't work either due do this.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 11, 2014

@ibuildthecloud based on that article yup.

@inconshreveable indeed :(

damm commented Jul 11, 2014

@ibuildthecloud based on that article yup.

@inconshreveable indeed :(

@inconshreveable

This comment has been minimized.

Show comment
Hide comment
@inconshreveable

inconshreveable Jul 11, 2014

@damm wrong autocomplete? don't think I'm helpful here

inconshreveable commented Jul 11, 2014

@damm wrong autocomplete? don't think I'm helpful here

@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Jul 11, 2014

Contributor

I am weary of giving more privileges by default until we have user namespaces. There was the discussion about allowing non-dropped capabilities to be specified. I think that would be a better solution for this no?

Contributor

vmarmol commented Jul 11, 2014

I am weary of giving more privileges by default until we have user namespaces. There was the discussion about allowing non-dropped capabilities to be specified. I think that would be a better solution for this no?

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 11, 2014

Contributor

I am not sure which capability is causing but I would figure this is SYS_PTRACE. This blocks reading certain data in /proc. We want to allow processes inside the container to ptrace inside the container, but block them from ptracing processes outside of the container. The other thing SYS_PTRACE blocks is looking to see where the process memory offset. With the information a hacker could figure how how to configure his buffer overflow.

Could someone try to build a docker with SYS_PTRACE and see if this fixes the problem, or use the --add-cap SYS_PTRACE if you have this patch to see if this fixes the problem.

Contributor

rhatdan commented Jul 11, 2014

I am not sure which capability is causing but I would figure this is SYS_PTRACE. This blocks reading certain data in /proc. We want to allow processes inside the container to ptrace inside the container, but block them from ptracing processes outside of the container. The other thing SYS_PTRACE blocks is looking to see where the process memory offset. With the information a hacker could figure how how to configure his buffer overflow.

Could someone try to build a docker with SYS_PTRACE and see if this fixes the problem, or use the --add-cap SYS_PTRACE if you have this patch to see if this fixes the problem.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 11, 2014

Contributor

@rhatdan yes, it's SYS_PTRACE. So do think this this is acceptable to add as a default or should we get the --cap-add flag in where the user can add that cap if they require reading from proc?

Contributor

crosbymichael commented Jul 11, 2014

@rhatdan yes, it's SYS_PTRACE. So do think this this is acceptable to add as a default or should we get the --cap-add flag in where the user can add that cap if they require reading from proc?

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 11, 2014

Contributor

I have no problem with allowing it, by default.

Contributor

rhatdan commented Jul 11, 2014

I have no problem with allowing it, by default.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 11, 2014

Contributor

I would lean more towards using the --cap-add flag to add this cap ( SYS_PTRACE ) instead of enabling it by default. The other issues about FSETID I would vote for it to be in the default list because it's producing filesystems with weird permissions

Contributor

crosbymichael commented Jul 11, 2014

I would lean more towards using the --cap-add flag to add this cap ( SYS_PTRACE ) instead of enabling it by default. The other issues about FSETID I would vote for it to be in the default list because it's producing filesystems with weird permissions

@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Jul 11, 2014

Contributor

I am also on the side of less privileges if we can :) especially with ptrace, it seems like that is something you know you want (not always the case for all the caps).

Contributor

vmarmol commented Jul 11, 2014

I am also on the side of less privileges if we can :) especially with ptrace, it seems like that is something you know you want (not always the case for all the caps).

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 11, 2014

Contributor

I believe CAP_SYS_PTRACE allows a process to ptrace a process owned by a different UID and to look at special fields in /proc. The lack of it does not prevent a process from ptracing another process with the same UID. If that was the case a normal user would not be allowed to run gdb or strace.

Contributor

rhatdan commented Jul 11, 2014

I believe CAP_SYS_PTRACE allows a process to ptrace a process owned by a different UID and to look at special fields in /proc. The lack of it does not prevent a process from ptracing another process with the same UID. If that was the case a normal user would not be allowed to run gdb or strace.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 11, 2014

Based on the LWN Article I posted above this capability tries to handle a lot. Maybe trying to get new capabilities to overlap this ones features and then start using those?

It seems to me that CAP_SYS_PTRACE is a large thing and for just reading /proc/1/environ adding such a large capability is :(

damm commented Jul 11, 2014

Based on the LWN Article I posted above this capability tries to handle a lot. Maybe trying to get new capabilities to overlap this ones features and then start using those?

It seems to me that CAP_SYS_PTRACE is a large thing and for just reading /proc/1/environ adding such a large capability is :(

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 11, 2014

Contributor

@damm yes, I think we will not add this one in the default whitelist but you should be able to use --cap-add SYS_PTRACE if you want this functionality

Contributor

crosbymichael commented Jul 11, 2014

@damm yes, I think we will not add this one in the default whitelist but you should be able to use --cap-add SYS_PTRACE if you want this functionality

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 11, 2014

@crosbymichael I'd still suggest a new capability. Basically even if we --cap-add eventually most people will do this and it will become a security issue. That or give me a way to re-read the environment variables without this capability?

I'm cool with that I just want a path forward that won't break in the future. I just don't have the time to spend testing and hacking on Docker to know what will break in the future.

16:23 you can get at it with something like: cat /proc/1/environ | tr '\000' '\n'

damm commented Jul 11, 2014

@crosbymichael I'd still suggest a new capability. Basically even if we --cap-add eventually most people will do this and it will become a security issue. That or give me a way to re-read the environment variables without this capability?

I'm cool with that I just want a path forward that won't break in the future. I just don't have the time to spend testing and hacking on Docker to know what will break in the future.

16:23 you can get at it with something like: cat /proc/1/environ | tr '\000' '\n'

@vmarmol

This comment has been minimized.

Show comment
Hide comment
@vmarmol

vmarmol Jul 11, 2014

Contributor

@damm are you referring to a new system capability? That may be out of scope for Docker :) I'm not convinced most people will add that capability in the long run. Lots of workloads don't require this.

You're asking for an API to get the original Docker environment from when you started? What about running a wrapper that dumps this somewhere on startup and reads it back after the supervisor comes up? Excuse if I'm just being dumb and am missing something :D trying to catchup on the thread.

Contributor

vmarmol commented Jul 11, 2014

@damm are you referring to a new system capability? That may be out of scope for Docker :) I'm not convinced most people will add that capability in the long run. Lots of workloads don't require this.

You're asking for an API to get the original Docker environment from when you started? What about running a wrapper that dumps this somewhere on startup and reads it back after the supervisor comes up? Excuse if I'm just being dumb and am missing something :D trying to catchup on the thread.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Jul 12, 2014

@vmarmol

Sometimes you start runit; or monit and it will unset system environment variables. Sure I could write a wrapper for /custom_init that would dump them.

It's available under /proc/1/environ. This is a text file that you can re-read to ensure you have your environment variables loaded. Supervisord can sometimes unset them; Monit does unset.

Having the ability to read the environment variables (without hackery or dumb shell tricks) should be trivial (as it was trivial).

P.S. I really don't want to run my containers with higher privileges I just want this danged file. I was told it should be safe in the future; it still works except for all the cap changes.

damm commented Jul 12, 2014

@vmarmol

Sometimes you start runit; or monit and it will unset system environment variables. Sure I could write a wrapper for /custom_init that would dump them.

It's available under /proc/1/environ. This is a text file that you can re-read to ensure you have your environment variables loaded. Supervisord can sometimes unset them; Monit does unset.

Having the ability to read the environment variables (without hackery or dumb shell tricks) should be trivial (as it was trivial).

P.S. I really don't want to run my containers with higher privileges I just want this danged file. I was told it should be safe in the future; it still works except for all the cap changes.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jul 14, 2014

Contributor

We just merged #6968 so I think this can be solved by using the --cap-add SYS_PTRACE flag to get the ability to read from proc and strace. I don't think we want to add this to the default whitelist so this flag lets you do what you need for this specific issue.

If you want to keep the env vars with using supervisor or some other init system maybe you need your entrypoint to dump the env to a readable file then exec your init system.

Contributor

crosbymichael commented Jul 14, 2014

We just merged #6968 so I think this can be solved by using the --cap-add SYS_PTRACE flag to get the ability to read from proc and strace. I don't think we want to add this to the default whitelist so this flag lets you do what you need for this specific issue.

If you want to keep the env vars with using supervisor or some other init system maybe you need your entrypoint to dump the env to a readable file then exec your init system.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 16, 2014

Contributor

We have found other problems with /proc/1, which I believe is a kernel issue. I have asked some kernel engineers to lookinto it.

Contributor

rhatdan commented Jul 16, 2014

We have found other problems with /proc/1, which I believe is a kernel issue. I have asked some kernel engineers to lookinto it.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 16, 2014

Contributor

ls -lZ /proc/1 >/dev/null
ls: cannot read symbolic link /proc/1/cwd: Permission denied
ls: cannot read symbolic link /proc/1/root: Permission denied
ls: cannot read symbolic link /proc/1/exe: Permission denied

Contributor

rhatdan commented Jul 16, 2014

ls -lZ /proc/1 >/dev/null
ls: cannot read symbolic link /proc/1/cwd: Permission denied
ls: cannot read symbolic link /proc/1/root: Permission denied
ls: cannot read symbolic link /proc/1/exe: Permission denied

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 16, 2014

Contributor

eparis> if (task->mm)
dumpable = get_dumpable(task->mm);
rcu_read_lock();
if (dumpable != SUID_DUMP_USER &&
!ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
rcu_read_unlock();
return -EPERM;

Might be in this kernel code.

Contributor

rhatdan commented Jul 16, 2014

eparis> if (task->mm)
dumpable = get_dumpable(task->mm);
rcu_read_lock();
if (dumpable != SUID_DUMP_USER &&
!ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
rcu_read_unlock();
return -EPERM;

Might be in this kernel code.

@crosbymichael crosbymichael reopened this Jul 16, 2014

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 16, 2014

Contributor

Ok the issue seems to be the clearing of the CapPrm, CapEff, CapBnd
docker run --rm -ti fedora grep ^Cap /proc/self/status
CapInh: 00000000880425fb
CapPrm: 00000020880425fb
CapEff: 00000020880425fb
CapBnd: 00000020880425fb

Notice the 0000002

The problem is

Contributor

rhatdan commented Jul 16, 2014

Ok the issue seems to be the clearing of the CapPrm, CapEff, CapBnd
docker run --rm -ti fedora grep ^Cap /proc/self/status
CapInh: 00000000880425fb
CapPrm: 00000020880425fb
CapEff: 00000020880425fb
CapBnd: 00000020880425fb

Notice the 0000002

The problem is

@eparis

This comment has been minimized.

Show comment
Hide comment
@eparis

eparis Jul 16, 2014

Contributor

The docker daemon only clears the bset up to what it believes to be CAP_LAST_CAP. Which it apparently thinks is CAP_BLOCK_SUSPEND (or bit 36). However 3.15/3.16 has a new capability CAP_AUDIT_READ (bit 37)

Unlike to prm,eff,inh capability sets, the bset is cleared one bit at a time using a call to prctl(PR_CAPBSET_DROP, $CAPABILITY,...)

There seem to be 2 possible suggestions. Keep clearing capabilities past CAP_LAST_CAP blindly hoping that you got them all. There's probably some way to tell from the error code when you got them all, but I'm not sure...

Do something like libcap-ng. It collects the entire existing bset using PR_CAPBSET_READ and then goes about unsetting every bit. It does not pay attention to CAP_LAST_CAP at all. I think this is a better solution, but don't really know what is happening today...

Contributor

eparis commented Jul 16, 2014

The docker daemon only clears the bset up to what it believes to be CAP_LAST_CAP. Which it apparently thinks is CAP_BLOCK_SUSPEND (or bit 36). However 3.15/3.16 has a new capability CAP_AUDIT_READ (bit 37)

Unlike to prm,eff,inh capability sets, the bset is cleared one bit at a time using a call to prctl(PR_CAPBSET_DROP, $CAPABILITY,...)

There seem to be 2 possible suggestions. Keep clearing capabilities past CAP_LAST_CAP blindly hoping that you got them all. There's probably some way to tell from the error code when you got them all, but I'm not sure...

Do something like libcap-ng. It collects the entire existing bset using PR_CAPBSET_READ and then goes about unsetting every bit. It does not pay attention to CAP_LAST_CAP at all. I think this is a better solution, but don't really know what is happening today...

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz Feb 25, 2015

Contributor

ping @crosbymichael status on this?

Contributor

jessfraz commented Feb 25, 2015

ping @crosbymichael status on this?

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Feb 25, 2015

My interests were resolved after knowing that it requires a privileged container; there are bugs with certain kernel versions that make this different.

damm commented Feb 25, 2015

My interests were resolved after knowing that it requires a privileged container; there are bugs with certain kernel versions that make this different.

@jessfraz jessfraz closed this Mar 2, 2015

@rthille

This comment has been minimized.

Show comment
Hide comment
@rthille

rthille Feb 18, 2016

I'm seeing a similar issue where after a process makes the setresuid call it can't read it's own environment, and root can't read it either.

rthille commented Feb 18, 2016

I'm seeing a similar issue where after a process makes the setresuid call it can't read it's own environment, and root can't read it either.

@damm

This comment has been minimized.

Show comment
Hide comment
@damm

damm Feb 18, 2016

@rthille how about filing a new issue with more information on this? you can reference this issue of course.

A few other issues have been referenced previously; so if it's happening again a fresh issue might get new eyes.

damm commented Feb 18, 2016

@rthille how about filing a new issue with more information on this? you can reference this issue of course.

A few other issues have been referenced previously; so if it's happening again a fresh issue might get new eyes.

@rthille

This comment has been minimized.

Show comment
Hide comment
@rthille

rthille Feb 18, 2016

I'll work up a small program to confirm what I'm seeing tomorrow and file a new issue. Thanks.

rthille commented Feb 18, 2016

I'll work up a small program to confirm what I'm seeing tomorrow and file a new issue. Thanks.

@rthille

This comment has been minimized.

Show comment
Hide comment
@rthille

rthille Feb 18, 2016

Never mind, my "bug" turned out to be exactly this issue, I was just testing improperly.

rthille commented Feb 18, 2016

Never mind, my "bug" turned out to be exactly this issue, I was just testing improperly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment