-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
gh-128432: Add umask
context manager to shutil
#128433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests are marked as skipped for Windows, WASI, and Emscripten as they don't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Couple of small notes inline, and I think this would be worth adding to what's new (API additions are pretty much always worth mentioning)
Lib/shutil.py
Outdated
return None | ||
|
||
|
||
class umask(AbstractContextManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining the magic methods is enough to pass the subclass/instance check:
Line 34 in ebadfd1
@classmethod |
So we can skip the import and explicit inheritance from the ABC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class umask(AbstractContextManager): | |
class temp_umask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially resolved by dc1dcc0. I think we should keep inheriting from the ABC; isn't that sort of what it's for?
from shutil import ExecError | ||
|
||
|
||
@unittest.skipIf(os.name != "posix" or support.is_wasi or support.is_emscripten, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangent: we probably need a separate issue to update the os.umask
docs with an explanation of the subset of its functionality that actually works on Windows (if any).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@serhiy-storchaka made a valid comment on one of the Discourse threads that we should avoid using the bare umask
name for this, since shutil.chown
directly invokes os.chown
, just with support for named users and groups in addition to their respective numeric IDs.
I've made suggestions inline for switching to temp_umask
(which is what test.support
uses for this functionality). While I'm open to other naming suggestions, "temporary" seems like the clearest way to refer to setting the umask and reverting the change when done, so I like it better than any of the other options that have occurred to me (like active_umask
, set_umask
, update_umask
).
Misc/NEWS.d/next/Library/2025-01-03-01-12-47.gh-issue-128432.8kg5DK.rst
Outdated
Show resolved
Hide resolved
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Doc/library/shutil.rst
Outdated
Using :class:`umask` like this is better practice than first creating the file, | ||
and later changing its permissions with :func:`~os.chmod`, between which a | ||
period of time exists in which the file may have too lenient permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is lacking. It doesn't explain why os.open()
with optional permission mode passed directly to the underlying C open(2)
/ openat(2)
is a bad solution.
I would argue it's always better than a context manager since it is thread- and generator-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to pass the permission mode to open
means that has to be done everywhere in a block of code, whereas using a umask
context manager allows the code inside the context to be written agnostically. Perhaps I should word that into the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved by df59fdeacb4b4ba4fcd78fc15440bac4c9090a71
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I don't really see how df59fde answers my question. It just asserts that umask is better, no discussion of tradeoffs despite the non-obvious nature of them.
Maybe it would help you understand where I'm coming from if I frame this as someone who wants to be convinced that he should, in fact, use this new function at all.
I was happy with os.open()
. In the context of writing security-sensitive software, I know that os.open is guaranteed to do what it asserts to do. I don't see that surety with a umask-based context manager -- what am I missing?
It's important to note that the sole purpose of a umask context manager as described in this PR is for security hardening, since there's a race condition using the naive "first open() it, then os.chmod() it". But in the security domain, one tends to make different tradeoffs compared to other fields, and in particular, we want to be very explicit about the security properties of the code we are writing. Writing more lines of code is not a downside, not if it allows you to be more secure. With that in mind, I don't understand why it's a bad thing to "need to pass a custom mode
keyword argument to :func:open
every time.
If this is just about how much you have to type, I really don't see any valid reason to ever use this context manager. I can save a couple keystrokes at each open()
site, but in return I have to do deep soul-searching about whether I expect the code to be callable from a thread (and decide whether that means my application is insecure under freethreading and needs to be rewritten using os.open()
after all) and whether I need to avoid calling any function that recursively uses this context manager from a generator or async function. That's a pretty steep cost.
"Explicit is better than implicit" is a particularly good rule for security!
...
As a context manager it is also ill-suited to defense-in-depth, i.e. "this program is security-sensitive, better set the umask everywhere just in case we create files". If that property is desirable, I would want to call os.umask()
once at program startup, not use a context manager at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So again: why should this context manager exist at all? My perspective is that this context manager is a bad idea and is dangerous for security and the mere fact of its existence will encourage people to write insecure software. This is perhaps a bit cynical of me, so the first thing I did when I saw the PR proposing it was to check the documentation to see whether it explained a use case. Maybe if I see a well-documented use case I will come around to the idea, yeah?
But at the moment I still don't see anything to persuade me that the function is a net benefit. So I would personally avoid it on the grounds that I believe it to be a dangerous footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note additionally that umask is entirely ignored if a default ACL exists which is more permissive than your umask. However, directly applying os.open(..., mode=0o600)
will actually work, even with an ACL.
In many cases, you're not concerned about an ACL...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bookmarked this discussion and I'll read through it later (head spinning with git {merge,rebase,cherry-pick} and I'm about to get off).
I'll re-mark this conversation as unresolved, and read through these comments in the morning; there's clearly intricacies to discuss. Just acknowledging I've seen this! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for considering. :) It may be a moot point as @ncoghlan has now closed the linked issue: #128432 (comment)
(I think that my mention of ACLs was the deciding factor.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment on the discussion thread has made me re-think and close the issue. The required functionality now needs two separate contexts: set_umask
and increase_umask
: "With nested use, I'd want the mask to be as restrictive as if all umasks had been applied, because each level of nesting had it's own reason to restrict."
My resolution is to make a separate PyPI package, that keeps these intricacies outside of the Python stdlib.
I mentioned systemd.exec
's UMask
functionality just to highlight that a 'global' (whole-runtime-long) umask for a service can be set outside of the program itself, instead of the program having to call umask at startup; it's more agnostic than shoving a umask 0022
on line 2 of every script. systemd does a lot of daemonization stuff for the user automatically these days, whereas other system service managers may differ (I didn't know which service manager Gentoo users use sorry). e.g.: I serve my Flask apps with gunicorn
+ systemd service units. If one wanted to daemonize gunicorn from the CLI instead there's a daemonize helper, which closes fds and sets the umask etc. So either way umask is handled outside of the Flask app itself.
I regard consolidating to a single entry/exit point the job of main()
Therefore a set_umask
context manager wasn't about wrapping main
, either via the service manager, or a server which daemonizes from the CLI, or setting umask ones' self directly with os.umask
; it's about using it as a short-term context:
e.g.: sockets. One can't use open()
with a custom mode to create a socket afaik. The only way to bind
an AF_UNIX socket to a path is to call bind()
, which respects the user's umask. Looking again at gunicorn
's binding method they set the umask, do some stuff, then revert. One could bind() -> chmod() -> listen()
instead, without fear of a too-loosely permissioned socket being accessed by someone else between bind()
and chmod()
, because listen()
needs to be called before anyone can connect to the socket, but umask() -> bind()/open() -> umask()
is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.
I will die on this hill.
Security concerns are paramount, got it.
PEP 703
I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter? If you mean to suggest 'each thread should use open(mode=...) instead of messing with the process-wide umask' then yes you're quite right that's easier and safer.
ACL
I have only explicitly used ACL a couple of times, for allowing the http
user and nobody else to access .htpasswd
files stored safely in a directory. From what I can tell from setfacl(1)
the mask is &
-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separate increase_umask
context. It's a good job you two made these comments which uphold each other. Many thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.: sockets. One can't use
open()
with a custom mode to create a socket afaik. The only way tobind
an AF_UNIX socket to a path is to callbind()
, which respects the user's umask. Looking again atgunicorn
's binding method they set the umask, do some stuff, then revert. One couldbind() -> chmod() -> listen()
instead, without fear of a too-loosely permissioned socket being accessed by someone else betweenbind()
andchmod()
, becauselisten()
needs to be called before anyone can connect to the socket, butumask() -> bind()/open() -> umask()
is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.
On linux, you should rather os.fchmod()
after socket()
and before bind()
.
I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter?
The threading
module has historically been fairly unpopular in comparison to multiprocessing
, since the GIL wrangles access by simply preventing python code from running at all.
Free-threading removes this limitation, so we can expect threading to become a lot more popular in python land. :D
I have only explicitly used ACL a couple of times, for allowing the
http
user and nobody else to access.htpasswd
files stored safely in a directory. From what I can tell fromsetfacl(1)
the mask is&
-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separateincrease_umask
context. It's a good job you two made these comments which uphold each other. Many thanks!
ACLs can have:
- special permission grants on a user by user basis
- a "default ACL" which is essentially "umask but per directory instead of per process". When both a default-ACL and a umask exist, the umask is ignored, and the default-ACL is applied as a mask instead.
So if there is an ACL that is more restrictive than the default umask 0022, but more permissive than the secure umask you want to use in your application / library, your extra-secure umask is ignored and the default-ACL mask which isn't quite as good ends up winning, even though you conceptually applied the umask after the ACL.
with umask_of(0o077):
do_private_stuff() Seems Pythonic enough? Let me know.
|
I have made the requested changes; please review again |
Thanks for making the requested changes! @ncoghlan: please review the changes made to this pull request. |
|
||
It also allows you to write code that creates files, in a way that is agnostic | ||
of permissions -- that is without the need to pass a custom ``mode`` keyword | ||
argument to :func:`open` every time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to mention the reason I didn't even question the utility of the context manager once you proposed it: changing the process umask means that even files created by code in extension modules, third party dependencies, or by invoked subprocesses, will be affected by the umask (since changing the umask affects the actual process state, and is inherited by child processes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the touch_file
example given below this paragraph is sufficient to indicate this
The mechanics of how umask
is inherited by subprocesses would be more suited in Doc/library/os.rst
than here
Currently debating:
https://discuss.python.org/t/add-umask-to-contextlib/75809/41 |
Closed temporarily whilst debating the addition of a valuable abstract class
|
Cancelled by conclusion of #128432 |
Closes 128432
umask
context manager toshutil
#128432📚 Documentation preview 📚: https://cpython-previews--128433.org.readthedocs.build/