Un-deprecate functional API for importlib resources & add subdirectory support #116608

encukou · 2024-03-11T16:03:03Z

Feature or enhancement

Proposal:

The importlib.resources functions {open,read}_{text,binary}, path, is_resource and contents, deprecated in 3.11 and removed in 3.13 alphas, are, anecdotally, missed by quite a few users.
They provide a simple API for simple tasks, while the full-featured Traversable API is better suited for complex ones -- especially for implementing new resources-aware loaders.

I'm now in a position where I can add these functions back and support them.

Their main drawback -- not allowing subdirectories -- can be solved by taking multiple path components as positional arguments, for example:

importlib.resources.read_text('modulename', 'subdirectory', 'subsubdir', 'resource.txt')

The additional arguments (encoding and errors) would become keyword-only.

There is a wrinkle in this: in Python 3.9-3.11, the above would mean:

importlib.resources.read_text(
    'modulename', 'subdirectory',
    encoding='subsubdir',
    errors='resource.txt',
)

~~I believe that this is acceptable, since:~~

~~pragmatically: typical file names do not match typical encoding/errorhandler names~~
~~lawyerly: the functions have already been deprecated for 2 releases; no one is using them now, right?~~

~~However, if this is a problem, I can~~

[edit: This is solved by:]

make the encoding argument required if a text-reading function more than one path component is given.
plan to lift this limitation around 3.15.

importlib.resources.read_text(
    'modulename', 'subdirectory', 'subsubdir', 'resource.txt',
    encoding='utf-8',
)
importlib.resources.read_text('modulename', 'resource.txt')  # OK
importlib.resources.read_text('modulename', 'subdirectory', 'utf-8')  # error

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/deprecating-importlib-resources-legacy-api/11386/29

Linked PRs

The text was updated successfully, but these errors were encountered:

FFY00 · 2024-03-11T17:07:22Z

Wouldn't this be a bit late for that? We already went through the deprecation period, and removed the feature in the alpha releases, bringing them back now would be a bit confusing.

The importlib.resources functions {open,read}_{text,binary}, path, is_resource and contents, deprecated in 3.11 and removed in 3.13 alphas, are, anecdotally, missed by quite a few users.

Can you actually show a couple examples of this affecting users downstream? I think that's the most viable argument to bring that API back.

zooba · 2024-03-11T17:07:55Z

taking multiple path components as positional arguments

Why not just take multiple components with separators in a single argument? It's easy enough to require forward slash, disallow .. and even to normalise backslashes on Windows if you feel like it.

If they didn't allow subdirectories before (I never noticed, tbh), then presumably using a slash here would have either failed completely or worked. Either way, we can enable them in a new release.

(And add me to the anecdotal list of people who missed them. It's easy enough to add a few lines of code to bring them back, which is how I have been handling it so far, but I'd be happier to have those few lines in the stdlib.)

jaraco · 2024-03-11T21:41:38Z

At least one audience that would like to keep the legacy APIs is in mesonbuild/meson#12401.

I admit, I prefer this approach over keeping the legacy APIs with the cruft that it still had lying around. It adds a mostly-compatible layer and restores these wrappers in a supported way.

On one hand, this approach violates the "preferably one way" to do things; users will need to decide which way works best for them, creating a variety of supported approaches. On the other hand, I do appreciate that it offers a friendlier interface for certain operations (esp. path(...)).

FFY00 and I put a lot of work into this deprecation process, so it'll be disappointing to now see this reversed at the last minute, but it does feel like the right thing to do, especially since someone else is willing to own the implementation (thanks encukuo!). We will have to backport the change to importlib_resources, but that should be fairly straightforward.

Overall, I'm +0 on the change. I'd really like to see more vocal support from other core devs before committing to this approach.

encukou · 2024-03-12T08:47:07Z

I've made the encoding argument mandatory for _text functions when multiple path names are given.

Wouldn't this be a bit late for that?

Yes, sorry. Previously I couldn't commit to supporting this API.

Why not just take multiple components with separators in a single argument?

I'd rather not derail discussion on this issue. Support for separators can be added later if necessary. If they will, allowing multiple arguments will still be useful.

FFY00 and I put a lot of work into this deprecation process

Sorry to hear that. Sunk costs suck :(
This makes it seem that implementing the deprecation process was similarly (or more) time-consuming as keeping the API working. That's not a good situation to be in, especially considering all the work users need to put in to update their code.

pfmoore · 2024-03-12T10:02:30Z

(And add me to the anecdotal list of people who missed them. It's easy enough to add a few lines of code to bring them back, which is how I have been handling it so far, but I'd be happier to have those few lines in the stdlib.)

I’ll add a “me too” here as well. Being able to do simple things simply is an advantage.

pradyunsg · 2024-03-12T13:15:07Z

+1 from me on the (updated) proposed API as well as un-deprecating these -- for reasons that have been discussed on the d.p.o thread as well as mentioned here by others.

eli-schwartz · 2024-03-12T14:13:19Z

Their main drawback -- not allowing subdirectories -- can be solved by taking multiple path components as positional arguments, for example:
importlib.resources.read_text('modulename', 'subdirectory', 'subsubdir', 'resource.txt')

I've always sort of wondered why this is a drawback at all, compared to simply doing this:

with importlib.resources.path('modulename.subdirectory.subsubdir', 'resources.txt') as f:
    ...

I'm not objecting to the new API! It's more ergonomic than pretending everything is a namespace module. But for backwards compatibility with python < 3.13 it seems practical to use the two-argument form, and the lack of a new API doesn't seem like it should have been a killer problem before now.

zooba · 2024-03-12T14:59:18Z

I don't understand the reason we can't reimplement it as:

def read_text(module, filename, *args, **kwargs): #use proper args if you want here, I just don't know them all off the top of my head
    with (path(module) / filename).open("r", *args, **kwargs) as f:
        return f.read()

Why do we need the module and filename as multiple args instead of just two?

zooba · 2024-03-12T15:01:56Z

I'd rather not derail discussion on this issue.

How is it derailing this issue? You're bringing back an API, which I like, and changing the design in a potentially backwards-incompatible way in the process, which I don't. Why is it derailing to ask why it has to have a different design now?

barneygale · 2024-03-12T18:45:14Z

Why do we need the module and filename as multiple args instead of just two?

I'm catching myself up on this. I think the answer is (somewhere) in this thread: https://gitlab.com/python-devs/importlib_resources/-/issues/58

jaraco · 2024-03-12T19:06:07Z

Why do we need the module and filename as multiple args instead of just two?

I'm catching myself up on this. I think the answer is (somewhere) in this thread: https://gitlab.com/python-devs/importlib_resources/-/issues/58

Which was migrated to python/importlib_resources#58.

barneygale · 2024-03-12T19:47:49Z

IIUC:

Support for resources in subdirectories required new APIs, e.g. something resembling Traversible.iterdir().
The pathlib API was considered a good fit and chosen for the task.
The functional interface wasn't enhanced to support subdirectories; instead it was earmarked for eventual removal.

So there's perhaps four levels of support we could offer for the functional APIs:

Deprecate + remove (@jaraco's original plan, already landed)
Restore old APIs + functionality as it was.
Restore old APIs, and add support for subdirectories to existing functions (this issue and @encukou's PR)
Restore old APIs, add add support for subdirectories, including adding new functions where needed (e.g. functional equivalent of traversable.iterdir())

Personally I'd lean towards option 2. If folks need subdirectory support they can use the OOP API - that's it's whole reason to exist!

jaraco · 2024-03-12T20:00:17Z

Early in the design, I'd deemed it infeasible implement path for anything that's expecting subdirectories, because what does it mean to get a path for a directory? And indeed, the replacement, as_file(files(...)) would only allow access to a single file on disk, even though the Traversable API provided access to traverse through a tree of files. Later (and only fairly recently in python/importlib_resources#255), as_file was expanded to add support for directories (by manifesting a directory and all of its contents in a temporary directory when needed). Until that support was added, the meaning of path() for a directory would have been broken. I believe that explains why the functional API was intended to be deprecated but only later seems viable to be revived with subdirectory support.

pfmoore · 2024-03-12T20:01:30Z

I'm going to backtrack slightly on my support for this. I do find the Traversable API conceptually complex when all I want to do is "read a file from my package" - the functional API was simpler, and as I said above, I believe that "simple things should be simple to do". So I remain +1 on un-deprecating that API.

The complexities around "add subdirectory support" are where I'm less certain. I like the idea of supporting subdirectories without needing the full Traversable API, but I'm not entirely happy with the proposal here (specifically the somewhat-weird compromise of making the encoding parameter mandatory). As an alternative, what's wrong with allowing /-separated resource names, and translating them into the equivalent Traversable invocation? This has, by definition, no more security issues than the joinpath approach, so the problems discussed in the linked issue are addressed at least as well as the Traversable API addresses them. And it matches what people were reported as expecting to be able to do with the functional API.

So there's perhaps four levels of support we could offer for the functional APIs:

Deprecate + remove (@jaraco's original plan, already landed)

Restore old APIs + functionality as it was.

Restore old APIs, and add support for subdirectories to existing functions (this issue and @encukou's PR)

Restore old APIs, add add support for subdirectories, including adding new functions where needed (e.g. functional equivalent of traversable.iterdir())

Personally I'd lean towards option 2. If folks need subdirectory support they can use the OOP API - that's it's whole reason to exist!

I would be happy with that.

I will note that option 2 was essentially what a lot of people asked for in the Discourse thread at the time of the original deprecation, and it felt very much to me that positions got entrenched because the comments were possibly too heated. Maybe with the passage of time, and experience of the impact, all we're doing here is confirming that it turned out that the wrong choice was made? Luckily, we still have time to reverse that choice before the functional API actually gets removed.

jaraco · 2024-03-12T20:07:16Z

One issue that was discussed in comments leading up to this comment was that, by adding support for subdirectories, functions like path and read_text could be providing access to content outside of the package. The files() API avoided this issue by leaving it up to the caller to do any traversal. If there are plans for path() and other calls to accept path parameters, it will re-open those (possibly security-related) concerns.

pfmoore · 2024-03-12T20:24:45Z

I don't see how - you shouldn't be accepting untrusted input to the Traversable API in just the same way as for the functional API. joinpath("..") is no more secure than a .. component in a /-separated path, surely?

jaraco · 2024-03-12T20:29:53Z

I don't see how - you shouldn't be accepting untrusted input to the Traversable API in just the same way as for the functional API. joinpath("..") is no more secure than a .. component in a /-separated path, surely?

I'm not convinced it's a concern meriting intervention, but it was something that @warsaw raised.

barneygale · 2024-03-12T21:32:33Z

From the linked comment:

I'll assume that the implementation prevents navigating to non-subdirectories; i.e. you can't escape the enclosing resource package directory, or find your way to /etc, etc. :)

If I'm right in thinking that files() returns a pathlib.Path or zipfile.Path, then in the former case, nothing prevents users from "escaping" the resource root by calling joinpath('..') or joinpath('/etc/passwd'), right?

zooba · 2024-03-12T21:36:44Z

Maybe, but in cases where that would work, there are easier ways for someone to do it.

Much easier to say that resources shouldn't be looked up using untrusted input. The vast majority of uses will be using static strings here anyway, because arbitrary files do not exist in your own package.

But I believe it uses a different type anyway and only converts to an actual path at the very end, so it may not even be possible to traverse outside of the package. Which makes this entirely a moot point (and would be a great design if so).

eli-schwartz · 2024-03-12T21:38:39Z

I'm pretty sure it produces a traversable that also inherits from Path and can be used as either one (modulo isinstance checks to see whether it's from pathlib or zipfile for cases where the difference matters).

I'd assume that the standard path joining operations use the traversable impl which then checks for attempts to escape the resource root.

eli-schwartz · 2024-03-12T21:44:38Z

Maybe, but in cases where that would work, there are easier ways for someone to do it.

Much easier to say that resources shouldn't be looked up using untrusted input. The vast majority of uses will be using static strings here anyway, because arbitrary files do not exist in your own package.

Right, there's no reason someone couldn't be passing the same inputs to os.path.join and then to builtin open() directly, instead of importlib.resources, so it's basically splitting hairs to say that the resources API needs super special protections for security.

... Also a distraction from the main topic I guess.

encukou · 2024-03-13T10:15:24Z

@jaraco, do you think allowing path separators is OK? I'm happy to delegate the decision to you.

My opinion, for what it's worth: it is OK.
I don't think we need to normalize slashes or disallow backslashes. I've often wished for a more opinionated resource system that would enforce names that work the same on all platforms (e.g. [a-z0-9]+ segments separated by [-/._]), as that would eliminate common beginner packaging mistakes. But, importlib.resources isn't that, and I don't think it should try to be that.
I can see the case for avoiding path traversal attacks (disallowing ..), though. Tests will catch a platform-specific FileNotFoundError, but they won't catch a vulnerability.

I prefer allowing multiple *path_parts, as in pathlib's joinpath(), but I'm OK deferring that to a follow-up PR.

jaraco · 2024-03-13T14:26:03Z

But I believe it uses a different type anyway and only converts to an actual path at the very end

In actuality, all importlib.resources.files() returns in most cases is a pathlib.Path or a zipfile.Path. There's no wrapper or additional protections, such as to enforce that only the Traversable interface is supplied. In fact in python/importilb_resources#291, I explored wrapping the result in something that extended the interface but found it is difficult or impossible to do in general. The Traversable interface exists mainly for providers to know the minimum interface they must supply, but for stdlib loaders, the Path classes provide that interface. If we wanted to provide traversal limitations for objects returned from files(), we'd probably want to augment both pathlib.Path and zipfile.Path to supply those limitations and then enable that behavior in files().

If I'm right in thinking that files() returns a pathlib.Path or zipfile.Path, then in the former case, nothing prevents users from "escaping" the resource root by calling joinpath('..') or joinpath('/etc/passwd'), right?

Correct.

@jaraco, do you think allowing path separators is OK?

Yes, I believe it's okay. I believe there's existing precedence for supporting it, in part because 'pathlib.Path' supports it, but also because it's explicitly documented that Traversables should support paths separated by posixpath.sep (forward slash, platform independent).

I've often wished for a more opinionated resource system that would enforce names that work the same on all platforms (e.g. [a-z0-9]+ segments separated by [-/._]), as that would eliminate common beginner packaging mistakes.

importlib.resources has been somewhat opinionated, as seen above, by supporting one path separator (posixpath.sep, which often works universally), but unopinionated in that it allows any valid path names or separators. That is, importlib.resources doesn't want to get into the business of regulating what is a valid resource. If a use-case demands to have names that contain spaces or unicode characters as they're meaningful to the application, they should be allowed to do so.

I prefer allowing multiple *path_parts, as in pathlib's joinpath(), but I'm OK deferring that to a follow-up PR.

Since Traversable.join_path accepts (*descendants), it should be readily supportable. That interface only evolved that behavior somewhat recently, so it may not be available in all Pythons (and thus custom providers may not be reasonably expected to fully support it yet).

One more comment - you may consider contributing this change to importlib_resources simultaneous to or prior to submitting it with CPython as importlib_resources provides a fuller test suite, providing coverage on older Pythons and checks for other aspects like code coverage, type checking, style consistency, and more. Totally your call, though.

encukou · 2024-03-20T14:54:09Z

Since Traversable.join_path accepts (*descendants), it should be readily supportable.

Oh, that's great! I'll add it to the docs.

The semantics are clear then: I'll delegate to joinpath :)

you may consider contributing this change to importlib_resources

Roger, will send the patch there too.

encukou · 2024-03-21T10:31:12Z

I've updated the PR to allow path separators.

A PR for importlib_resources is here: python/importlib_resources#303

…directory support (GH-116609)

encukou · 2024-04-05T11:57:23Z

Thanks for the discussion everyone, and sorry I couldn't make everyone happy.

I merged the PR. If you disagree with that, please preferably continue the discussion here: https://discuss.python.org/t/11386/47

encukou · 2024-04-05T13:17:20Z

Apparently the tests fail on big-endian machines; will fix

…GH-117569) gh-116609: Ignore UTF-16 BOM in importlib.resources._functional tests To test the `errors` argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine). Use endswith to ignore the BOM.

…dd subdirectory support (pythonGH-116609)

… tests (pythonGH-117569) pythongh-116609: Ignore UTF-16 BOM in importlib.resources._functional tests To test the `errors` argument, we read a UTF-16 file as UTF-8 with "backslashreplace" error handling. However, the utf-16 codec adds an endian-specific byte-order mark, so on big-endian machines the expectation doesn't match the test file (which was saved on a little-endian machine). Use endswith to ignore the BOM.

encukou added type-feature A feature request or enhancement topic-importlib labels Mar 11, 2024

encukou added a commit to encukou/cpython that referenced this issue Mar 11, 2024

pythongh-116608: Bring back importlib.resources functional API

bdc3209

bedevere-app bot mentioned this issue Mar 11, 2024

gh-116608: Bring back importlib.resources functional API #116609

Merged

jaraco mentioned this issue Mar 20, 2024

gh-106531: Remove importlib.resources._legacy #106532

Merged

encukou mentioned this issue Mar 21, 2024

Add functional API python/importlib_resources#303

Merged

encukou added a commit that referenced this issue Apr 5, 2024

gh-116608: importlib.resources: Un-deprecate functional API & add sub…

9ceaee7

…directory support (GH-116609)

encukou closed this as completed Apr 5, 2024

encukou reopened this Apr 5, 2024

bedevere-app bot mentioned this issue Apr 5, 2024

gh-116608: Ignore UTF-16 BOM in importlib.resources._functional tests #117569

Merged

encukou closed this as completed Apr 8, 2024

diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024

pythongh-116608: importlib.resources: Un-deprecate functional API & a…

53963b5

…dd subdirectory support (pythonGH-116609)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Un-deprecate functional API for importlib resources & add subdirectory support #116608

Un-deprecate functional API for importlib resources & add subdirectory support #116608

encukou commented Mar 11, 2024 •

edited by bedevere-app bot

FFY00 commented Mar 11, 2024 •

edited

zooba commented Mar 11, 2024

jaraco commented Mar 11, 2024

encukou commented Mar 12, 2024

pfmoore commented Mar 12, 2024

pradyunsg commented Mar 12, 2024 •

edited

eli-schwartz commented Mar 12, 2024

zooba commented Mar 12, 2024

zooba commented Mar 12, 2024

barneygale commented Mar 12, 2024

jaraco commented Mar 12, 2024

barneygale commented Mar 12, 2024 •

edited

jaraco commented Mar 12, 2024

pfmoore commented Mar 12, 2024

jaraco commented Mar 12, 2024

pfmoore commented Mar 12, 2024

jaraco commented Mar 12, 2024

barneygale commented Mar 12, 2024

zooba commented Mar 12, 2024

eli-schwartz commented Mar 12, 2024 •

edited

eli-schwartz commented Mar 12, 2024

encukou commented Mar 13, 2024

jaraco commented Mar 13, 2024

encukou commented Mar 20, 2024

encukou commented Mar 21, 2024

encukou commented Apr 5, 2024

encukou commented Apr 5, 2024

Un-deprecate functional API for importlib resources & add subdirectory support #116608

Un-deprecate functional API for importlib resources & add subdirectory support #116608

Comments

encukou commented Mar 11, 2024 • edited by bedevere-app bot

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

FFY00 commented Mar 11, 2024 • edited

zooba commented Mar 11, 2024

jaraco commented Mar 11, 2024

encukou commented Mar 12, 2024

pfmoore commented Mar 12, 2024

pradyunsg commented Mar 12, 2024 • edited

eli-schwartz commented Mar 12, 2024

zooba commented Mar 12, 2024

zooba commented Mar 12, 2024

barneygale commented Mar 12, 2024

jaraco commented Mar 12, 2024

barneygale commented Mar 12, 2024 • edited

jaraco commented Mar 12, 2024

pfmoore commented Mar 12, 2024

jaraco commented Mar 12, 2024

pfmoore commented Mar 12, 2024

jaraco commented Mar 12, 2024

barneygale commented Mar 12, 2024

zooba commented Mar 12, 2024

eli-schwartz commented Mar 12, 2024 • edited

eli-schwartz commented Mar 12, 2024

encukou commented Mar 13, 2024

jaraco commented Mar 13, 2024

encukou commented Mar 20, 2024

encukou commented Mar 21, 2024

encukou commented Apr 5, 2024

encukou commented Apr 5, 2024

encukou commented Mar 11, 2024 •

edited by bedevere-app bot

FFY00 commented Mar 11, 2024 •

edited

pradyunsg commented Mar 12, 2024 •

edited

barneygale commented Mar 12, 2024 •

edited

eli-schwartz commented Mar 12, 2024 •

edited