Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespacing for file operations #58

Closed
sametmax opened this issue Feb 24, 2014 · 8 comments
Closed

Namespacing for file operations #58

sametmax opened this issue Feb 24, 2014 · 8 comments

Comments

@sametmax
Copy link
Collaborator

I gave some thoughts to the discussion we had here : #26

The status proposal gave me an idea. We could actually generalize since the path class is becomming big, and maybe it would be beneficial to start separating responsabilities in the lib.

Some actions are clearly not related to a path, but to a file such as :

  • calculating md5
  • gettings stats
  • getting text from file
  • getting mimetype
    etc

Also, the fact the class path is supposed to deal with only a path make it difficult to add handy features that are file related without making it a strange melting pot.

This is why I'm suggesting we make a namespace dedicated to file operations. We could simply name it file, and it would be used this way :

>>> p = path('/etc/fstab')
>>> p.file.file_operation()

This would remove from the main namespace a lot of methods, making shell exploration easier. It would also be the occasion to remove duplicates, matching the Zen of Python "there should be only one way to do it". Currently we keep several names to match the os.path API and aliases for compat reasons. While this is supposed to help, with the current size of the lib, it's becoming less and less convenient to instinctly find the proper method without several trials or looking at the source code.

Enventually, it allows us to make the API more modern and adopt a fluid syntaxe :

statvfs() -> p.file.statvfs()
atime and getatime() -> p.file.atime
mtime and getmtime() -> p.file.mtime
ctime and getctime() -> p.file.ctime
utime -> p.file.utime
write_text() -> p.file.write(data)
write_bytes() -> p.file.write(data, 'b')
text() -> p.file.text
?() -> p.file.bytes
lstat() -> p.file.lstat()
read_hash() -> p.file.hash()
read_hexhash() -> p.file.hash().hexdigest()
read_md5() -> p.file.md5()
chunks() -> p.file.chunks()
getsize() -> p.file.size
stat() -> p.file.stat()
inplace() -> p.file.inplace()

file could also be a callable that opens the file and returns a file like object :

for line in p.file():
    # do stuff

The file() callable would accept all the parameters codec.open accept, a bit like what path.open does, but with encoding as well (set to 'utf8' as default like in Python 3).

However, it would not returns a stdlib file like object, but rather a proxy (or a monkeypatched deepcopy for perfs) with all the same attributes except an additional .path returning the original path object used to open the file :

>>> p.file().path is p
True

Last time you shared with me your concern about being backward incompatible, so i'm suggesting this to be added in an experimental branch in which we can play with what can be the future of path.py. Including all rejected features that would have made it backward incompatible, just to see where it leads us.

This branch can be used for future features such as asynchronous file system manipulations, name wrangling, advanced permission checks or anything that is not acceptable to put in the good old path.py.

path.py is now very stable and feature complete, and I don't think it's going to move much. We can safely support this version ad vitam and consider freezing it in the comming years while trying new things on the experimental branch.

Another benefit of the experimental branch is we can drop 2.7 support and focus on Python 3, making the codebase a bit smaller. path.py as is will always be supported for 2.7 and 3 and always be available. We don't really need to touch it.

@jaraco
Copy link
Owner

jaraco commented Feb 24, 2014

On one hand, I like namespaces (let's do more of those). On the other hand, this new model is more verbose. I want to explore some concerns with that approach.

For the majority of use cases, user's will be invoking the 'file' apis, so when they port their code, they will be replacing 'p.op()' with 'p.file.op()' everywhere. I suspect as they're doing that find/replace in their codebase, they will begin to wonder, "is the .file attribute really necessary?"

A secondary concern is that 'file' almost runs contradictory to some operations, such as listdir. I don't see listdir mentioned above. Would directory operations be in their own namespace, then? What about operations that cross purposes (such as mtime or exists)?

I believe you're on to something here, but surely the devil is in the details. Do please feel free to put together the experimental branch for further evaluation and discussion. Do consider as well:

As currently deployed, path.py is released as a single module. This simplicity of layout allows other projects to "vendor" path.py by adding that module to their project (e.g. Paver). You should try to retain that constraint.

It's also too early to drop support for Python 2.7. While I agree that Python 3 is the new target platform for innovation, any project that's going to remain popular will need to support Python 2 for the next year or two. If Python 2.7 support is omitted from the experimental branch, it will certainly be back-ported to 2.7 shortly after its initial release by somebody. Probably better just to maintain Python 2.7 support in the branch if it's not too onerous. If dropping 2.7 support is essential to this effort, then let's ticket the deprecation separately and have this ticket depend on that deprecation.

@sametmax
Copy link
Collaborator Author

sametmax commented Mar 4, 2014

Ok, I just pushed 2 branches : experimental, which will be the 'master', but for all the experimental features, and "filenamespacing", which is one feature branch dedicated to namespacing specifically.

I worked a bit on namespacing today, and moved most file operations to the file namespace.

Unit tests pass with py.test under Python 2.7 and Python 3.3.

For now the result is that come from :

>>> p.
Display all 142 possibilities? (y or n)
p.abspath
p.chroot
p.dirname
p.files
p.index
p.isnumeric
p.lstat
p.normcase
...

TO

>>> p.
Display all 115 possibilities? (y or n)
p.abspath
p.copyfile
p.encode
p.files
p.isalnum
p.isnumeric
p.lower
...

>>> p.file.
p.file.access
p.file.chmod
p.file.ctime
p.file.getctime
p.file.in_place
p.file.mtime
p.file.path
p.file.read_hexhash
p.file.stat
p.file.utime
p.file.write_text
p.file.atime
p.file.chown
p.file.get_owner
p.file.getmtime
p.file.lines
p.file.open
p.file.pathconf
p.file.read_md5
p.file.statvfs
p.file.write_bytes
p.file.bytes
p.file.chunks
p.file.getatime
p.file.getsize
p.file.lstat
p.file.owner
p.file.read_hash
p.file.size
p.file.text
p.file.write_lines 

I didn't make crazy changes for now : no removed alias, no renamed methods, etc. I prefer that we agree on each little increment and merge them one at a time.

Let me know if I can merge that to "experiment".

@sametmax
Copy link
Collaborator Author

Ok, you seem busy so I took a bet and merged it.

I will add more to it later.

@jaraco
Copy link
Owner

jaraco commented Mar 19, 2014

It's true. I've been busy. I don't know when I'll have time to review the proposal in order to answer my questions.

@sametmax
Copy link
Collaborator Author

It's ok, i contribute once a century so I won't complain :)

Le mer. 19 mars 2014 10:33:27 CET, Jason R. Coombs a écrit :

It's true. I've been busy. I don't know when I'll have time to review
the proposal in order to answer my questions.


Reply to this email directly or view it on GitHub
#58 (comment).

@Nodd
Copy link
Contributor

Nodd commented Sep 3, 2014

Just my 2 cents, I don't like the added namespace, it is way too verbose. I would prefer a backward incompatible new version of path.py.

Another way to implement file-only methods or properties is to create a subclass to path dedicated to files. This was if the user wants to use a file-only method, he has to explicitly cast the path first.

@jaraco
Copy link
Owner

jaraco commented Sep 27, 2014

I like Nodd's suggestion, but it does fall prey to the problem with OO inheritance. How do you create a File(Path), which is a Path, but which doesn't implement the full interface of a path (directory-only methods)?

Perhaps instead you can have the Path class derive from two classes, File and Directory. I'm not even sure that's possible if File and Directory are themselves subclasses of str. They could be mix-ins, but in that case, a Path cannot be readily cast to one of those types.

I think an alternate PR would be in order.

@jaraco
Copy link
Owner

jaraco commented Jul 13, 2015

Although these ideas have their merits, we seem to have stalled on a suitable interface. I'm closing for now, but feel free to revive as interest allows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants