Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heads up: pathlib.VirtualPath may be coming soon #105

Closed
barneygale opened this issue Jul 6, 2023 · 20 comments · Fixed by #112
Closed

Heads up: pathlib.VirtualPath may be coming soon #105

barneygale opened this issue Jul 6, 2023 · 20 comments · Fixed by #112
Assignees
Labels

Comments

@barneygale
Copy link

barneygale commented Jul 6, 2023

I've logged a CPython PR that adds a private pathlib._VirtualPath class:

python/cpython#106337

If/when that lands, it's not much more work to drop the underscore and introduce pathlib.VirtualPath to the world. This would be Python 3.13 at the earliest.

Would it be suitable for use in pathy? It would be great to hear your feedback!

@justindujardin
Copy link
Owner

Any public API that Pathy could hang off of would be very useful. Please feel free to update this issue when there is a build version that I can use.

@barneygale
Copy link
Author

barneygale commented Dec 27, 2023

Hello - I've published CPython's private PathBase class in a PyPI package: https://pypi.org/project/pathlib-abc/0.1.0/. No docs yet but I should have them ready soon.

If the PyPI package succeeds and matures, I'm hopeful we can make PathBase public in Python itself. Let me know if you have any problems/questions? Thanks!

@justindujardin
Copy link
Owner

justindujardin commented Jan 8, 2024

I was about to resort to doing something similar to this package but baked into Pathy, so you saved me a lot of time and duplication. It's especially nice knowing this might feed back into Python and that this package could serve as a fallback after that.

I will spend some time today looking into integrating this with Pathy and ask questions as they arise.

Thanks for publishing a package that contains this functionality 🙇

@justindujardin justindujardin self-assigned this Jan 10, 2024
@justindujardin
Copy link
Owner

@barneygale, So far, so good; there are some things I have to refactor to account for the pathlib_abc library not providing a concrete file system implementation, but I think it will ultimately simplify things.

The main question that's come up is whether you plan to add type hints or would be willing to accept a PR to add them. I haven't delved deep into why, but mypy/Pylance both dislike the pathlib_abc module here and there, warning about Unknown base class types and improper subclass signatures. I can add type ignores or reimplement the properties to add signatures, but that feels kind of icky.

Here are some examples of what I mean

PathBase.root is partially known: str | bytes | Unknown
image

Overridden methods have an implicit return type that conflicts with the concrete implementation expectations
image

The typing bits aside, there are no blockers so far; I should be able to open a PR soon.

@justindujardin
Copy link
Owner

I'm adding support for __lt__ to support sorted, and I notice the pathlib class has "_cparts" cached property that's used in the < operator:

    @property
    def _cparts(self):
        # Cached casefolded parts, for hashing and comparison
        try:
            return self._cached_cparts
        except AttributeError:
            self._cached_cparts = self._flavour.casefold_parts(self._parts)
            return self._cached_cparts

Can you tell me anything about the significance of this cache? I'm all for caches when needed, but that you've left it out of the ABC library makes me wonder how much it contributes? Also, I see that you've cached most properties in the ABC classes, so I wonder if there's maybe a simpler solution you had in mind with comparable performance using the existing cached properties?

@barneygale
Copy link
Author

It's only important in pathlib because Windows paths need to perform case-insensitive comparisons, and even then I'm not sure if it has a huge effect!

FWIW I've just removed most/all caching in the ABCs. PyPI release coming up...

@justindujardin
Copy link
Owner

One troubling thing I found is that Pathy will no longer inherit from pathlib.Path while using this library. This comes with some negative side-effects, like I noted in another issue, where a serialization library that spaCy uses has a function to coerce strings to Path objects if they're not already that type:

def force_path(location, require_exists=True):
    if not isinstance(location, Path):
        location = Path(location)
    if require_exists and not location.exists():
        raise ValueError(f"Can't read file: {location}")
    return location

Pathy objects used to make it through unscathed, but now they get corrupted and transformed into Path objects that don't work.

Pathy isn't so widely used that it will be a significant problem, but it's something to keep in mind. Do you see any workarounds I might be missing that could keep this compatibility?

@barneygale
Copy link
Author

barneygale commented Jan 10, 2024

That sounds like a bug in spacy, judging by this bit of their CONTRIBUTING.md:

Code that interacts with the file-system should accept objects that follow the
pathlib.Path API, without assuming that the object inherits from pathlib.Path.
If the function is user-facing and takes a path as an argument, it should check
whether the path is provided as a string. Strings should be converted to
pathlib.Path objects.

(edit: sorry, just read your comment on the issue and realised you're ahead of me!)

They might be interested in depending on pathlib_abc - I'll reach out.

@justindujardin
Copy link
Owner

justindujardin commented Jan 10, 2024

Yeah, it was just an example of the type of side-effect that could happen in other libraries. They have a more "correct" implementation in spaCy directly, i.e.

def ensure_path(path: Any) -> Any:
    """Ensure string is converted to a Path.

    path (Any): Anything. If string, it's converted to Path.
    RETURNS: Path or original argument.
    """
    if isinstance(path, str):
        return Path(path)
    else:
        return path

So this is probably a bug in srsly and how it interacts with spaCy.

@barneygale
Copy link
Author

With pathlib_abc that method could be implemented as follows:

from pathlib_abc import PathBase
from pathlib import Path

def ensure_path(path: PathBase | os.PathLike) -> PathBase:
    if not isinstance(path, PathBase):
        path = Path(path)
    return path

But srsly seems to be a low-dependencies affair, so it could instead adopt spaCy's string check behaviour.

@barneygale
Copy link
Author

In general I don't think it's correct to inherit from pathlib.Path unless your object represents a local filesystem path. That's sort-of the whole point of Path.

@justindujardin
Copy link
Owner

justindujardin commented Jan 10, 2024

In general I don't think it's correct to inherit from pathlib.Path unless your object represents a local filesystem path.

I agree with you on that. I always felt awkward using internals and inheriting behavior that didn't apply to my use-case. Still, sometimes we work with solutions that aren't ideal because they're easily grasped.

Moving to pathlib_abc feels much cleaner. It will be nice not to refactor Pathy again next year and the year after with each new Python release because the pathlib internals were refactored. 😅

@justindujardin
Copy link
Owner

justindujardin commented Jan 11, 2024

I've finished work on the PR to replace pathlib.Path base with pathlib_abc. While it's not required, @barneygale if you'd like to review my changes, I'd welcome any feedback you have.

FYI: I pinned the version of pathlib_abc, so if you make significant changes, let me know, and I'll update it.

Copy link

🎉 This issue has been resolved in version 0.11.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

@justindujardin
Copy link
Owner

@barneygale the pathlib_abc experience has been great so far. Reopening this in the event you wanted to follow up more. Github auto closed it because I noted it in the PR. 😅

One thing I noticed while working on this is that pathlib_abc has some methods that only make sense for file systems. For cloud storage they're just extra code that wouldn't really work as expected if you called it.

I took a quick pass and trimmed the PathBase class def to include the methods I think could be removed and exist solely in pathlib proper. They don't harm Pathy, but they don't help either.

class PathBase(PurePathBase):
    def is_mount(self):
        ...
    def is_symlink(self):
        ...
    def is_junction(self):
        ...
    def is_block_device(self):
        ...
    def is_char_device(self):
        ...
    def is_fifo(self):
        ...
    def is_socket(self):
        ...
    def readlink(self):
        ...
    def symlink_to(self, target, target_is_directory=False):
        ...
    def hardlink_to(self, target):
        ...

None of the cloud providers support symbolic links, so at best we could implement it with application logic and metadata. That might be a good argument for keeping it, given the utility of symlinks.

@barneygale
Copy link
Author

If you implement stat() you should find that the is_ methods return False, and you can ignore the other methods if your filesystem doesn't support em (they'll raise UnsupportedOperation). They make sense in some situations, e.g. TarPath in the docs supports most of these methods.

@justindujardin
Copy link
Owner

They make sense in some situations, e.g. TarPath in the docs supports most of these methods.

Sure, that makes sense. Like I said, they don't hurt anything, I just wanted to provide some feedback.

What's your stance on conda-forge? Pathy has a conda recipe and it can't resolve pathlib_abc 😓 I'm not sure how many people consume pathy through conda-forge.

@barneygale
Copy link
Author

FYI, I've published pathlib-abc 0.2.0. Changelog: https://pathlib-abc.readthedocs.io/en/latest/changes.html

@jamesmyatt
Copy link

jamesmyatt commented Jan 15, 2024

What's your stance on conda-forge? Pathy has a conda recipe and it can't resolve pathlib_abc 😓 I'm not sure how many people consume pathy through conda-forge.

(As an observer) conda-forge is community-led and anyone can submit a package as long as they're prepared to maintain the recipe; you don't need permission. The main question is whether @barneygale wants to officially support and maintain it himself. I'd raise this in the pathlib_abc issue tracker rather than here.

Note that you can have multiple maintainers, so you could submit the package with both you and Barney listed as maintainers, if that's what he wants.

@justindujardin
Copy link
Owner

FYI, I've published pathlib-abc 0.2.0. Changelog: https://pathlib-abc.readthedocs.io/en/latest/changes.html

I've been working on some updates to my Mathy project recently. I'll check out the latest pathlib_abc sometime soon.

I'd raise this in the pathlib_abc issue tracker rather than here.

Yeah, well Barney did ask for questions on this issue, but you're probably right.

I'll close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants