New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make os.walk and os.fwalk yield namedtuple instead of tuple #71047
Comments
I am suggesting that os.walk and os.fwalk will yield a namedtuple instead of the regular tuple they currently yield. def walk_wrapper(walk_it):
for dir_entry in walk_it:
if dir_entry[0] == "aaa":
yield dir_entry Because walk_it can be either os.walk or os.fwalk I need to access dir_entry via index. My change will allow me to change this function to: def walk_wrapper(walk_it):
for dir_entry in walk_it:
if dir_entry.dirpath == "aaa":
yield dir_entry Witch is more clear and readable. |
Quick review of patch looks good. I'll try to look it over more closely later. |
Classes are normally named with CamelCase. Also, "walk_result" or "WalkResult" seems like an odd name that doesn't really fit. DirEntry or DirInfo is a better match (see the OP's example, "for dir_entry in walk_it: ...") The "versionchanged" should be a "versionadded". The docs should use "named tuple" instead of "namedtuple". The former is the generic term used in the glossary to describe the instances. The latter is the factory function that creates a new tuple subclass. The attribute descriptions for the docs are pretty good. They should also be applied as actual docstrings in the code as well. The docs and code for fwalk() needs to be harmonized with walk() so the the tuple fields use the same names: change (root, dirs, files) to (dirpath, dirnames, filenames). |
Sorry, but I disagree with Raymond in many points.
See "stat_result", "statvfs_result", "waitid_result", "uname_result", and "times_result". DirEntry is already used in the os module. And if accept this feature, needed separate types for walk() and fwalk() results.
os.walk() is not new. Just it's result is changed. Class "walk_result" can be tagged with "versionadded", but I'm not sure there is a need to document it separately. The documentation of the os module already too large. "uname_result" and "times_result" are not documented.
(root, dirs, files) is shorter than (dirpath, dirnames, filenames) and these names were used with os.walk() and os.fwalk() for years. I general, I have doubts about this feature.
for root, dirs, files in os.walk(...):
... Adding named tuple doesn't add any benefit for common case. In OP case, you can either use fwalk-based implementation of walk (bpo-15200): def fwalk_as_walk(*args, **kwargs):
for x in os.fwalk(*args, **kwargs):
yield x[:-1] or just ignore the rest of tuple items: for root, *_ in walk_it:
...
|
In regard to Raymond As for Serhiy`s doubts:
I did some testing on my own PC: Regular tuple: 7.53 msec
I agree that there will be no names that will satisfy everybody but I think the names that are currently in the documentation are the most trivial choice. As for points 1,2,5 this feature doesn`t break any of the old walk API. One more point I would like input on is the testing. I can remove the walk method from the WalkTests, FwalkTests classes and use the new named tuple attributes in the tests. Do you think its better or should we keep the tests with the old API (access using indexes)? |
I'm not clear on what you asking, but regardless we should have both the old (by-index) tests and new by-attribute tests. |
https://www.python.org/dev/peps/pep-0008/#class-names -- "Class names should normally use the CapWords convention." Examples: difflib.py dis.py doctest.py functools.py inspect.py nntplib.py No doubt, there are exceptions to the rule in the standard library which is less consistent than we might like: "stat_result". That said, stat_result is a structseq and many C type names are old or violate the rules (list vs List, etc). New named tuples should follow PEP-8 can use CapWords convention unless there is a strong reason not to in a particular case. |
Thanks for the response Ethan I think that I will leave the tests as they are in the current patch.
I actually thought we should keep on consistency with other "result" like objects. I can see your point about new named tuples that should follow PEP-8 and DirEntry is an example of new "result" class that follow PEP-8. |
Should we have concerns about performances? Accessing a namedtuple value is almost 4x times slower compared to a plain tuple [1] and os.walk() may iterate hundreds of times. |
I would expect that the field access time is inconsequential compared to just about every other aspect of os.walk(). |
namedtuple's attribute access was optimized in recent years. In 3.7 it is 30% faster than in 3.4. So now it is only 3x times slower compared to a plain tuple. On other hand, os.walk() and os.fwalk() was optimized too. In 3.7 they are up to 3.5x times faster than in 3.4 (with hot caches). I didn't make measurements, but I expect that using namedtuples with os.walk() can decrease its performance at least by few percents. My main concern is that this feature will increase the complexity of the documentation of the os module (very little) and may encourage writing less clear code (but this is just my own preference, others can found new style more clear). |
s/at least/at most/ |
There doesn't seem to be a consensus that the proposal is a net win. Serhiy made a persuasive argument that the added complexity isn't worth it. I'll leave this open for a day or two so that anyone else can make their case. Otherwise, I'll mark this as closed/rejected. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: