Skip to content

Conversation

jb2170
Copy link
Contributor

@jb2170 jb2170 commented Jan 3, 2025

@jb2170
Copy link
Contributor Author

jb2170 commented Jan 3, 2025

Tests are marked as skipped for Windows, WASI, and Emscripten as they don't use umask(2) in a Posix-y way.

@ZeroIntensity ZeroIntensity self-requested a review January 3, 2025 04:46
Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Couple of small notes inline, and I think this would be worth adding to what's new (API additions are pretty much always worth mentioning)

Lib/shutil.py Outdated
return None


class umask(AbstractContextManager):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the magic methods is enough to pass the subclass/instance check:

@classmethod

So we can skip the import and explicit inheritance from the ABC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class umask(AbstractContextManager):
class temp_umask:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially resolved by dc1dcc0. I think we should keep inheriting from the ABC; isn't that sort of what it's for?

from shutil import ExecError


@unittest.skipIf(os.name != "posix" or support.is_wasi or support.is_emscripten,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangent: we probably need a separate issue to update the os.umask docs with an explanation of the subset of its functionality that actually works on Windows (if any).

@ncoghlan ncoghlan added type-feature A feature request or enhancement 3.14 bugs and security fixes labels Jan 3, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits.

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serhiy-storchaka made a valid comment on one of the Discourse threads that we should avoid using the bare umask name for this, since shutil.chown directly invokes os.chown, just with support for named users and groups in addition to their respective numeric IDs.

I've made suggestions inline for switching to temp_umask (which is what test.support uses for this functionality). While I'm open to other naming suggestions, "temporary" seems like the clearest way to refer to setting the umask and reverting the change when done, so I like it better than any of the other options that have occurred to me (like active_umask, set_umask, update_umask).

@bedevere-app
Copy link

bedevere-app bot commented Jan 3, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Using :class:`umask` like this is better practice than first creating the file,
and later changing its permissions with :func:`~os.chmod`, between which a
period of time exists in which the file may have too lenient permissions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is lacking. It doesn't explain why os.open() with optional permission mode passed directly to the underlying C open(2) / openat(2) is a bad solution.

I would argue it's always better than a context manager since it is thread- and generator-safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to pass the permission mode to open means that has to be done everywhere in a block of code, whereas using a umask context manager allows the code inside the context to be written agnostically. Perhaps I should word that into the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by df59fdeacb4b4ba4fcd78fc15440bac4c9090a71

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't really see how df59fde answers my question. It just asserts that umask is better, no discussion of tradeoffs despite the non-obvious nature of them.

Maybe it would help you understand where I'm coming from if I frame this as someone who wants to be convinced that he should, in fact, use this new function at all.

I was happy with os.open(). In the context of writing security-sensitive software, I know that os.open is guaranteed to do what it asserts to do. I don't see that surety with a umask-based context manager -- what am I missing?

It's important to note that the sole purpose of a umask context manager as described in this PR is for security hardening, since there's a race condition using the naive "first open() it, then os.chmod() it". But in the security domain, one tends to make different tradeoffs compared to other fields, and in particular, we want to be very explicit about the security properties of the code we are writing. Writing more lines of code is not a downside, not if it allows you to be more secure. With that in mind, I don't understand why it's a bad thing to "need to pass a custom mode keyword argument to :func:open every time.

If this is just about how much you have to type, I really don't see any valid reason to ever use this context manager. I can save a couple keystrokes at each open() site, but in return I have to do deep soul-searching about whether I expect the code to be callable from a thread (and decide whether that means my application is insecure under freethreading and needs to be rewritten using os.open() after all) and whether I need to avoid calling any function that recursively uses this context manager from a generator or async function. That's a pretty steep cost.

"Explicit is better than implicit" is a particularly good rule for security!

...

As a context manager it is also ill-suited to defense-in-depth, i.e. "this program is security-sensitive, better set the umask everywhere just in case we create files". If that property is desirable, I would want to call os.umask() once at program startup, not use a context manager at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So again: why should this context manager exist at all? My perspective is that this context manager is a bad idea and is dangerous for security and the mere fact of its existence will encourage people to write insecure software. This is perhaps a bit cynical of me, so the first thing I did when I saw the PR proposing it was to check the documentation to see whether it explained a use case. Maybe if I see a well-documented use case I will come around to the idea, yeah?

But at the moment I still don't see anything to persuade me that the function is a net benefit. So I would personally avoid it on the grounds that I believe it to be a dangerous footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note additionally that umask is entirely ignored if a default ACL exists which is more permissive than your umask. However, directly applying os.open(..., mode=0o600) will actually work, even with an ACL.

In many cases, you're not concerned about an ACL...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've bookmarked this discussion and I'll read through it later (head spinning with git {merge,rebase,cherry-pick} and I'm about to get off).

I'll re-mark this conversation as unresolved, and read through these comments in the morning; there's clearly intricacies to discuss. Just acknowledging I've seen this! 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for considering. :) It may be a moot point as @ncoghlan has now closed the linked issue: #128432 (comment)

(I think that my mention of ACLs was the deciding factor.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the discussion thread has made me re-think and close the issue. The required functionality now needs two separate contexts: set_umask and increase_umask: "With nested use, I'd want the mask to be as restrictive as if all umasks had been applied, because each level of nesting had it's own reason to restrict."

My resolution is to make a separate PyPI package, that keeps these intricacies outside of the Python stdlib.

I mentioned systemd.exec's UMask functionality just to highlight that a 'global' (whole-runtime-long) umask for a service can be set outside of the program itself, instead of the program having to call umask at startup; it's more agnostic than shoving a umask 0022 on line 2 of every script. systemd does a lot of daemonization stuff for the user automatically these days, whereas other system service managers may differ (I didn't know which service manager Gentoo users use sorry). e.g.: I serve my Flask apps with gunicorn + systemd service units. If one wanted to daemonize gunicorn from the CLI instead there's a daemonize helper, which closes fds and sets the umask etc. So either way umask is handled outside of the Flask app itself.

I regard consolidating to a single entry/exit point the job of main()

Therefore a set_umask context manager wasn't about wrapping main, either via the service manager, or a server which daemonizes from the CLI, or setting umask ones' self directly with os.umask; it's about using it as a short-term context:

e.g.: sockets. One can't use open() with a custom mode to create a socket afaik. The only way to bind an AF_UNIX socket to a path is to call bind(), which respects the user's umask. Looking again at gunicorn's binding method they set the umask, do some stuff, then revert. One could bind() -> chmod() -> listen() instead, without fear of a too-loosely permissioned socket being accessed by someone else between bind() and chmod(), because listen() needs to be called before anyone can connect to the socket, but umask() -> bind()/open() -> umask() is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.

I will die on this hill.

Security concerns are paramount, got it.

PEP 703

I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter? If you mean to suggest 'each thread should use open(mode=...) instead of messing with the process-wide umask' then yes you're quite right that's easier and safer.

ACL

I have only explicitly used ACL a couple of times, for allowing the http user and nobody else to access .htpasswd files stored safely in a directory. From what I can tell from setfacl(1) the mask is &-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separate increase_umask context. It's a good job you two made these comments which uphold each other. Many thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.: sockets. One can't use open() with a custom mode to create a socket afaik. The only way to bind an AF_UNIX socket to a path is to call bind(), which respects the user's umask. Looking again at gunicorn's binding method they set the umask, do some stuff, then revert. One could bind() -> chmod() -> listen() instead, without fear of a too-loosely permissioned socket being accessed by someone else between bind() and chmod(), because listen() needs to be called before anyone can connect to the socket, but umask() -> bind()/open() -> umask() is an idiom that could apply to all file types, not just sockets (KISS). I guess it's up to personal taste.

On linux, you should rather os.fchmod() after socket() and before bind().

I don't know too much about the technical details of the GIL. My understanding is that with free-threading, each thread will be able to run simultaneously on the CPU without a GIL wrangling the threads' accessing of the heap / environment / interpreter?

The threading module has historically been fairly unpopular in comparison to multiprocessing, since the GIL wrangles access by simply preventing python code from running at all.

Free-threading removes this limitation, so we can expect threading to become a lot more popular in python land. :D

I have only explicitly used ACL a couple of times, for allowing the http user and nobody else to access .htpasswd files stored safely in a directory. From what I can tell from setfacl(1) the mask is &-ed with the ACL permissions granted, restricting permissions further. In a non-ACL setting this is the behavior that the discuss.python.org commenter expected, warranting a separate increase_umask context. It's a good job you two made these comments which uphold each other. Many thanks!

ACLs can have:

  • special permission grants on a user by user basis
  • a "default ACL" which is essentially "umask but per directory instead of per process". When both a default-ACL and a umask exist, the umask is ignored, and the default-ACL is applied as a mask instead.

So if there is an ACL that is more restrictive than the default umask 0022, but more permissive than the secure umask you want to use in your application / library, your extra-secure umask is ignored and the default-ACL mask which isn't quite as good ends up winning, even though you conceptually applied the umask after the ACL.

@jb2170
Copy link
Contributor Author

jb2170 commented Jan 3, 2025

@ncoghlan

  1. I think I'd prefer a more idiomatic name of umask_of instead of temp_umask. It sounds too similar to the folder /tmp. In particular if chdir is to follow in a similar way then temp_chdir sounds like it would chdir into /tmp.
with umask_of(0o077):
    do_private_stuff()

Seems Pythonic enough? Let me know.

  1. I'd like to keep the inheritance from AbstractContextManager. I'm currently working on another codebase (WTForms) and none of its validators inherit from a base class, which has led to some wild inconsistency over ~15 years.

  2. I'll commit the docs nitpick corrections. I'll also leave out the 'such as' clause for now in the docs, though I do want to press for moving chdir's move after this PR.

@jb2170
Copy link
Contributor Author

jb2170 commented Jan 4, 2025

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jan 4, 2025

Thanks for making the requested changes!

@ncoghlan: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from ncoghlan January 4, 2025 04:57

It also allows you to write code that creates files, in a way that is agnostic
of permissions -- that is without the need to pass a custom ``mode`` keyword
argument to :func:`open` every time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to mention the reason I didn't even question the utility of the context manager once you proposed it: changing the process umask means that even files created by code in extension modules, third party dependencies, or by invoked subprocesses, will be affected by the umask (since changing the umask affects the actual process state, and is inherited by child processes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the touch_file example given below this paragraph is sufficient to indicate this

The mechanics of how umask is inherited by subprocesses would be more suited in Doc/library/os.rst than here

@jb2170
Copy link
Contributor Author

jb2170 commented Jan 6, 2025

Currently debating:

  • Class name of context manager (and naming convention for similar context managers eg substitute_cwd and maybe substitute_euid)
  • Defining and inheriting from a new base class SubstitutionContext(Manager)

https://discuss.python.org/t/add-umask-to-contextlib/75809/41

@jb2170
Copy link
Contributor Author

jb2170 commented Jan 6, 2025

Closed temporarily whilst debating the addition of a valuable abstract class AbstractSubstitutionContextManager to contextlib.py.

shutil.substitute_umask would be (re)based off of its admission. AbstractSubstitutionContextManager would be the parent of

  • contextlib._RedirectStream, parent of redirect_stdout, redirect_stderr
  • shutil.substitute_umask
  • shutil.substitute_euid TBC
  • shutil.substitute_egid TBC
  • shutil.substitute_cwd (formerly contextlib.chdir) TBC

@jb2170 jb2170 closed this Jan 6, 2025
@jb2170
Copy link
Contributor Author

jb2170 commented Jan 8, 2025

Cancelled by conclusion of #128432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.14 bugs and security fixes awaiting change review type-feature A feature request or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants