Skip to content

Pathlib Support#566

Closed
BrianPugh wants to merge 17 commits intomicropython:masterfrom
BrianPugh:pathlib
Closed

Pathlib Support#566
BrianPugh wants to merge 17 commits intomicropython:masterfrom
BrianPugh:pathlib

Conversation

@BrianPugh
Copy link
Contributor

Here's my first stab at adding most of the common functionality of pathlib.Path. I'd say that 99% of the common use-case support is there. The glob functionality could use some work; currently it only supports a single "*" wildcard; however, this is the vast majority of common use-cases and it won't fail silently if non-supported glob patterns are provided.

Currently the module is named upathlib; this was just so that I could get it working with pytest. If someone could help me import it in tests without colliding with the builtin pathlib, that would be greatly appreciated!

Unit tests require pytest-mock to be installed.

@andrewleech
Copy link
Contributor

Thanks for this. I've got a "full" copy of cpython pathlib ported to micropython I've been meaning to submit here.
https://gitlab.com/alelec/micropython-pathlib/-/tree/master/
The problem with my one is is big for micropython use.
Your one here looks quite efficient by comparison!

I'm keen to compare the two and see if there's any more I'd like to add her, but at first glance this looks quite good.

@BrianPugh
Copy link
Contributor Author

I just realized my implementation will not properly handle combining multiple Path objects since I basically just reduce them to absolute paths; gonna rework it and will have something better shortly.

@BrianPugh
Copy link
Contributor Author

alright; should be good now; fixed all the bugs I could find and added more tests.

@jimmo
Copy link
Member

jimmo commented Nov 7, 2022

Thanks @BrianPugh this looks useful!

I don't have much familiarity with using pathlib in CPython, mostly I just use os.path, which we also provide a minimal version of in micropython-lib. Do you think it makes sense to make pathlib build on os.path?

Two quick notes, I will try and do a more detailed review later:

  • It should use os.sep in place of the literal "/".
  • It should be called just pathlib.py not upathlib.py.

@BrianPugh
Copy link
Contributor Author

BrianPugh commented Nov 7, 2022

We could make it build on top of os.path, but it would hardly shorten any code, so I don't think it's worth the dependency. Further still, the current minimal os.path implementation doesn't get some corner cases correct, like stripping "." in the middle of an absolute path, or stripping redundant separators.

After looking through micropyhton-lib's os.path code, I did notice that currently I don't handle expanding ~. We could add this, but I also imagine that ~ path would be quite rare in micropython code.

As for the other feedback:

It should use os.sep in place of the literal "/".

micropython's os does not contain os.sep. Also, os.path in this repo has "/" hardcoded as well.

It should be called just pathlib.py not upathlib.py.

As I said in the original post, it's just named upathlib for now because I couldn't figure out how to not have it collide with cpython's builtin pathlib when running unit tests.

@andrewleech
Copy link
Contributor

It would be better to use unittest from this repo and run the tests with micropython rather than cpython :-)

@BrianPugh
Copy link
Contributor Author

BrianPugh commented Nov 7, 2022

I'll give it a try later today; i've never actually used the bare unittest module; always through pytest.

I agree that the definitive testing should be done with micropython, but cpython has nice things like pdb.

@andrewleech
Copy link
Contributor

Ah, yes I see you've used pytest fixtures and .raises a bit in the test code. Eventually I want to get CI running here testing everything with micropython unittest so being compatible with that is desired - and unfortunately pytest cannot be ported easily to micropython.

I'm not sure about the best way to replace the fixture usage, but at least unittest has a raises test: https://github.com/micropython/micropython-lib/blob/master/python-stdlib/unittest/unittest.py#L192

Otherwise the normal assert you've used in most places is good to go.

@BrianPugh
Copy link
Contributor Author

I don't want to sign myself up for too much work, but after I look at unittest this evening, I might just try porting some pytest functionality to micropython. We could just start off with super simple things like pytest.raises (should be trivial), and eventually work our way up to more complicated things like fixtures. I also use mock, but that could also be worked around. Lemme brainstorm on this for a little bit. For my own projects I unittest most of the "business logic" with cpython under the assumption that it'll work the same with micropython. That way I can develop faster and write more robust tests.

@andrewleech
Copy link
Contributor

You'll be popular of you do make a port of pytest - it's been regularly requested over the years!

The real pytest makes significant use of things like metaclasses, the inspect module and other low level things micropython doesn't support.
To be honest I've been using unittest here so long I don't really remember what more pytest provides that's "truly" useful. So I never started trying to make a version of it, I just put some work into filling missing features in unittest itself.

For a long time unittest was missing the ability to "find and run all the tests in the project" but that was added recently with the unittest.discover module.

Yep a raises assert is already provided by unittest so no difficulty there :-)

I don't remember what fixtures are, I should look at them... and yeah there's a million ways to mock but having a well documented set of tools to do it in a consistent way is always good.

But most importantly, pytest is obviously popular, so having a module with matching API for the most popular aspects makes it easier to port existing code and bring in developers already experienced with pytest like yourself!

@jimmo
Copy link
Member

jimmo commented Nov 8, 2022

After looking through micropyhton-lib's os.path code, I did notice that currently I don't handle expanding ~.

I would argue that we don't really need this in os.path either, but the idea is that you can also use these libraries on the unix port.

micropython's os does not contain os.sep. Also, os.path in this repo has "/" hardcoded as well.

Ah sorry I forgot -- it actually does, but for some reason only enabled on STM32, Renesas, Unix, and Windows. We should enable this on all ports. So I guess go with hard-coded "/" for now and we can fix this later.

@BrianPugh
Copy link
Contributor Author

alright, so I gave micropython-compatible-pytest more thought:

  1. Code that interacts with hardware is inherently hard to unit-test. In the vast majority of cases, there isn't a straight-forward automatic "it works." Consider debouncing a switch; it either works or it doesn't and it requires human interaction. For these cases, it's more of a "demo" than a test, but thats fine. An example where a unit test would actually work is issuing an info-like command on an i2c bus and getting an expected response from an external chip. Takeaway: unit testing's main focus shouldn't be testing hardware; leave that for demo scripts.

  2. Because our tests don't interact with hardware, I will claim that they should probably be portable among python implementations. I.e. unit tests should work with both micropython and cpython. This allows nicer cpython development tools and libraries (debuggers, static-analysis, matplotlib, etc) to be used.

  3. micropython's primary libraries is nearly a strict-subset of cpython. The only re-occuring difference I run into is os.ilistdir, which can be mimic'd without much trouble.

Assuming we primarily develop libraries using cpython and just want to do final testing with micropython, this brings us to the key issue: "whats the minimal subset of pytest that can be implemented in micropython without limiting pytest running on cpython?"

  1. Building a script to auto-discover tests can be done with a reasonable amount of code (as shown in unittest-discover).
  2. There will inherently be tests or dependencies used that will be unavailable in micropython. We could use a decorator like pytest.mark.skipmicropython decorator for those tests. This would be easy.
  3. A micropython version of pytest currently cannot support decorators since there isn't a way to access function signatures beyond the function name (via __name__). So all tests that require a fixture would have to be skipped.
  4. Especially since at some point there's going to be an item that interacts with hardware, those should be mocked. So we need to also port mocking code.
  5. With all of this, we would certainly want a way to configure it. We could naively just do JSON, but the proper thing would be to put configurations in pyproject.toml; so now we would also need to port a toml parser (which i think would be independently useful, but more work for now).
  6. We'd also certainly want to support pytest.raises, but that's trivial to implement.

So, this is a long winded way of saying that I think a useful micropython-compatible-pytest would be a lot of work and couldn't actually be properly implemented given the current micropython implementation. If the pytest-port isn't "compatible enough" with the normal version, then there's really no point.

As for this PR, where does this put us? I don't think i could change all the tests to work with micropyhton unittest since I need to mock a few calls (like os.getcwd()).

@BrianPugh
Copy link
Contributor Author

So I guess go with hard-coded "/" for now and we can fix this later.

I'll factor out the separator symbol as it is in os.path, so that it's trivial to support later.

Do we want me to add support for "~"?

@BrianPugh
Copy link
Contributor Author

so actually; i'm going to leave the separator hardcoded for now; this MIGHT only cause issues with windows? I just say that because in order for it to really work, i'd have to change how root is reported, correct? I don't really have access to a windows machine for testing.

@andrewleech
Copy link
Contributor

Assuming we primarily develop libraries using cpython

Fwiw I never develop micropython libraries with cpython. There's enough minor corner cases in micropython/cpython compatibility and/or I'm often working with new micropython C features so I pretty much always use the micropython Unix port for all testing - including test-driven-development which is my preference.

I'm more than happy to continue without pytest myself, as mentioned I'm already comfortable with unittest, but I'm also very keen to get all test _* files run in CI with micropython unittest - so will want them to all be compatible with that (eventually).

@BrianPugh
Copy link
Contributor Author

BrianPugh commented Nov 8, 2022

so i'm working on porting the tests over to unittest, and in order to do so I'm going to implement some features of tempfile. For this, I'm using the available shutil. Is the implementation broken, or am I missing something:

for path, dirs, files in os.walk(top, False):

os.walk doesn't exist.

@BrianPugh
Copy link
Contributor Author

I'll update the tests here once #573 is merged.

@BrianPugh
Copy link
Contributor Author

Alright, I have migrated the pytest to unittest and it all runs as expected in micropython. Read for another review!

@stinos
Copy link

stinos commented Nov 11, 2022

So I guess go with hard-coded "/" for now and we can fix this later.

Looking at the code I have the impression it might be worth a relatively small effort and fix this now already, at least partially. For example we have os.path.join so then using '/'.join(paths) etc seems like a step back. And in other locations it should actually be using both os.sep and os.altsep (e.g. trimming and deduplicating separators) so doesn't seem that much effort to just do that now alredy (e.g. just rstrip and replcae both seps). Or at least I'd suggest defining a sep 'constant' at the top so it could later on be replaced in a single place instead of across the whole file.

@BrianPugh
Copy link
Contributor Author

Using os.path just to get the separator and join seems like a bit of bloat for the sake of DRY. If os.path was builtin, I would agree we should use it. Typically, if a project is using pathlib, it won't need os.path (unless a dependency is using it).

I'd rather defer the work to a future PR when the needs/requirements are more obvious.

@stinos
Copy link

stinos commented Nov 11, 2022

just to get the separator and join

And isdir, isfile, abspath, possibly more.

bloat for the sake of DRY

That's a bit of a strange argument, seeing that pathlib itself is sort of the definition of bloat for the sake of DRY :P I mean, it's quite big already itself but is essentially os and os.path with mostly a bunch of very short functions on top of it just to avoid having to repeat 1 or 2 lines of code

@andrewleech
Copy link
Contributor

bloat for the sake of DRY

I think this is a bit unfair in both cases ;-) I personally see pathlib as an almost perfect evolution of python style and best practice - it makes application code shorter and imho infinitely more readable - similar to the evolution of string formatting -> f-strings.

While I'm usually all-for avoiding any kind of duplication, in this case I definitely agree we should not make a dependency on os.path (two packages) just to pull in a single byte definition and a tiny join function. That's using quite a lot more flash use for little gain.
Once I've got pathlib installed I never use the os.path functions directly.

@BrianPugh
Copy link
Contributor Author

so what would we like to do with this?

@dpgeorge
Copy link
Member

dpgeorge commented Nov 14, 2022

Or at least I'd suggest defining a sep 'constant' at the top so it could later on be replaced in a single place instead of across the whole file.

I think this is a good idea to start with, just to eliminate the hard-coded constants. Eg:

from micropython import const

_SEP = const("/")

@BrianPugh
Copy link
Contributor Author

@dpgeorge addressed!

@stinos
Copy link

stinos commented Nov 14, 2022

so what would we like to do with this?

I'd like to use pathlib, but then it should work at least partially for backslashes; as in: should be fine that internally it uses forward slashes since 99% of Windows APIs work with those but the public facing parts accepting paths should handle the backslashes. And the 'is root' needs a fix.

Now, if this doesn't make it into the first version that's fine and I can implement it myself later on, but: I just don't see how that is going to happen without reusing code, that's just a bit too much and not straightforward enough to just duplicate. So if it's really such a problem to import os.path here, which will need all fixes for backslashes as well, then part of it could be implemented in the os module in C.

@dpgeorge
Copy link
Member

I can see both sides of the discussion here:

  1. keep pathlib self contained, if that's all that's needed by a project then os.path is never needed at all
  2. pathlib will eventually need support for Windows-style paths, and doing that is a bit tricky and needs to be done in both os.path and pathlib, so may as well just do it in os.path and build pathlib on top of that (alternatively build os.path on pathlib??)

I think in the long term we need to follow (2). As discussed above, already os.path.abspath and pathlib.Path.absolute are duplicated, and this function is not a trivial 1-2 liner, as evidenced by the fact that the current os.path.abspath is broken in ways that pathlib's absolute has fixed. So already there one could factor the code into one location (into os.path.abspath).

To make progress, I'm happy to merge this as-is and improve it later (ie factor code into os.path and make pathlib depend on that). But I think @stinos prefers the case of do it right from the beginning.

@BrianPugh
Copy link
Contributor Author

I'm in favor of merging now and fixing later. Not that this refactor would be particularly complicated, but it would be nice to have more unittesting/CI as guard rails. I understand the desire for deduplicating code, but I think thats more of a bigger effort (needs to be fixed in multiple modules). Additionally, I think we need to formalize the deduplication/dependency trade-off. I know each case is different, but generally there needs to be an agreement of "how good/bad is it to include another dependency." Micropython package management has gotten better recently with mip, but its not quite at the level of cpython.

@dpgeorge
Copy link
Member

OK, let's move forward and merge this, then in the future it can be improved if needed.

Squashed and merged in 0051a5e

@dpgeorge dpgeorge closed this Nov 17, 2022
@dpgeorge
Copy link
Member

Additionally, I think we need to formalize the deduplication/dependency trade-off. I know each case is different, but generally there needs to be an agreement of "how good/bad is it to include another dependency."

Yes it's tricky to formalise this. For example, if you already use os.path in your project then it's better if pathlib also uses it to reduce code size, even if it's only for one simple function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants