Make os.walk and os.fwalk yield namedtuple instead of tuple #71047

palaviv · 2016-04-26T13:45:07Z

BPO	26860
Nosy	@loewis, @rhettinger, @giampaolo, @ethanfurman, @serhiy-storchaka, @palaviv
Files	os-walk-result-namedtuple.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2017-04-11.02:02:11.234>
created_at = <Date 2016-04-26.13:45:06.781>
labels = ['type-feature', 'library']
title = 'Make os.walk and os.fwalk yield namedtuple instead of tuple'
updated_at = <Date 2017-04-11.02:02:11.204>
user = 'https://github.com/palaviv'

bugs.python.org fields:

activity = <Date 2017-04-11.02:02:11.204>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2017-04-11.02:02:11.234>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2016-04-26.13:45:06.781>
creator = 'palaviv'
dependencies = []
files = ['42612']
hgrepos = []
issue_num = 26860
keywords = ['patch']
message_count = 13.0
messages = ['264285', '264288', '264418', '264421', '264478', '264503', '264506', '264507', '291346', '291352', '291354', '291355', '291402']
nosy_count = 6.0
nosy_names = ['loewis', 'rhettinger', 'giampaolo.rodola', 'ethan.furman', 'serhiy.storchaka', 'palaviv']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue26860'
versions = ['Python 3.6']

palaviv · 2016-04-26T13:45:06Z

I am suggesting that os.walk and os.fwalk will yield a namedtuple instead of the regular tuple they currently yield.
The use case for this change can be seen in the next example:

def walk_wrapper(walk_it):
    for dir_entry in walk_it:
        if dir_entry[0] == "aaa":
           yield dir_entry

Because walk_it can be either os.walk or os.fwalk I need to access dir_entry via index.

My change will allow me to change this function to:

def walk_wrapper(walk_it):
    for dir_entry in walk_it:
        if dir_entry.dirpath == "aaa":
           yield dir_entry

Witch is more clear and readable.

ethanfurman · 2016-04-26T13:58:01Z

Quick review of patch looks good. I'll try to look it over more closely later.

rhettinger · 2016-04-28T06:42:48Z

Classes are normally named with CamelCase. Also, "walk_result" or "WalkResult" seems like an odd name that doesn't really fit. DirEntry or DirInfo is a better match (see the OP's example, "for dir_entry in walk_it: ...")

The "versionchanged" should be a "versionadded".

The docs should use "named tuple" instead of "namedtuple". The former is the generic term used in the glossary to describe the instances. The latter is the factory function that creates a new tuple subclass.

The attribute descriptions for the docs are pretty good. They should also be applied as actual docstrings in the code as well.

The docs and code for fwalk() needs to be harmonized with walk() so the the tuple fields use the same names: change (root, dirs, files) to (dirpath, dirnames, filenames).

serhiy-storchaka · 2016-04-28T08:00:48Z

Sorry, but I disagree with Raymond in many points.

Classes are normally named with CamelCase. Also, "walk_result" or "WalkResult" seems like an odd name that doesn't really fit. DirEntry or DirInfo is a better match (see the OP's example, "for dir_entry in walk_it: ...")

See "stat_result", "statvfs_result", "waitid_result", "uname_result", and "times_result". DirEntry is already used in the os module. And if accept this feature, needed separate types for walk() and fwalk() results.

The "versionchanged" should be a "versionadded".

os.walk() is not new. Just it's result is changed. Class "walk_result" can be tagged with "versionadded", but I'm not sure there is a need to document it separately. The documentation of the os module already too large. "uname_result" and "times_result" are not documented.

The docs and code for fwalk() needs to be harmonized with walk() so the the tuple fields use the same names: change (root, dirs, files) to (dirpath, dirnames, filenames).

(root, dirs, files) is shorter than (dirpath, dirnames, filenames) and these names were used with os.walk() and os.fwalk() for years.

I general, I have doubts about this feature.

There is little backward incompatibility. At least pickle is not backward compatible, and I guess other serialization methods.
os.walk() and os.fwalk() are purposed to be used in for loop with immediate unpacking result tuple:

    for root, dirs, files in os.walk(...):
        ...

Adding named tuple doesn't add any benefit for common case.

In OP case, you can either use fwalk-based implementation of walk (bpo-15200):

    def fwalk_as_walk(*args, **kwargs):
        for x in os.fwalk(*args, **kwargs):
            yield x[:-1]

or just ignore the rest of tuple items:

    for root, *_ in walk_it:
        ...

Using namedtuple is slower and consumes more memory than using tuple. Even for FS-related operation like os.walk() this can matter. A lot of code is optimized for exact tuples, with namedtuple this optimization is lost.
New names (dirpath, dirnames, filenames) are questionable. Why not use undersores (dir_names)? "dir" in dirpath refers to the current proceeded directory, but "dir" in dirnames refers to it's subdirectories. Currently you are free to use short names (root, dirs, files) from examples or what you prefer, but with namedtuple you are sticked with standard names forever. There are no names that satisfy everybody.
Third-party walk-like iterators generate tuples, so you can't use attribute access in too general code.

palaviv · 2016-04-29T08:42:25Z

In regard to Raymonds points I agree with Serhiys comments.

As for Serhiy`s doubts:

Using namedtuple is slower and consumes more memory than using tuple. Even for FS-related operation like os.walk() this can matter. A lot of code is optimized for exact tuples, with namedtuple this optimization is lost.

I did some testing on my own PC:
./python -m timeit -s "from os import walk" "for x in walk('Lib'): pass"

Regular tuple: 7.53 msec
Named tuple: 7.66 msec

New names (dirpath, dirnames, filenames) are questionable. Why not use undersores (dir_names)? "dir" in dirpath refers to the current proceeded directory, but "dir" in dirnames refers to it's subdirectories. Currently you are free to use short names (root, dirs, files) from examples or what you prefer, but with namedtuple you are sticked with standard names forever. There are no names that satisfy everybody.

I agree that there will be no names that will satisfy everybody but I think the names that are currently in the documentation are the most trivial choice.

As for points 1,2,5 this feature doesn`t break any of the old walk API.

One more point I would like input on is the testing. I can remove the walk method from the WalkTests, FwalkTests classes and use the new named tuple attributes in the tests. Do you think its better or should we keep the tests with the old API (access using indexes)?

ethanfurman · 2016-04-29T14:45:23Z

I'm not clear on what you asking, but regardless we should have both the old (by-index) tests and new by-attribute tests.

rhettinger · 2016-04-29T16:09:10Z

https://www.python.org/dev/peps/pep-0008/#class-names -- "Class names should normally use the CapWords convention."

Examples:
---------
crypt.py
6:from collections import namedtuple as _namedtuple
13:class _Method(_namedtuple('_Method', 'name ident salt_chars total_size')):

difflib.py
34:from collections import namedtuple as _namedtuple
36:Match = _namedtuple('Match', 'a b size')

dis.py
163:_Instruction = collections.namedtuple("_Instruction",
280: Generates a sequence of Instruction namedtuples giving the details of each

doctest.py
107:from collections import namedtuple
109:TestResults = namedtuple('TestResults', 'failed attempted')

functools.py
21:from collections import namedtuple
345:_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

inspect.py
51:from collections import namedtuple, OrderedDict
323:Attribute = namedtuple('Attribute', 'name kind defining_class object')
968:Arguments = namedtuple('Arguments', 'args, varargs, varkw')
1008:ArgSpec = namedtuple('ArgSpec', 'args varargs keywords defaults')
1032:FullArgSpec = namedtuple('FullArgSpec',
1124:ArgInfo = namedtuple('ArgInfo', 'args varargs keywords locals')
1317:ClosureVars = namedtuple('ClosureVars', 'nonlocals globals builtins unbound')
1372:Traceback = namedtuple('Traceback', 'filename lineno function code_context index')
1412:FrameInfo = namedtuple('FrameInfo', ('frame',) + Traceback._fields)

nntplib.py
159:GroupInfo = collections.namedtuple('GroupInfo',
162:ArticleInfo = collections.namedtuple('ArticleInfo',

No doubt, there are exceptions to the rule in the standard library which is less consistent than we might like: "stat_result". That said, stat_result is a structseq and many C type names are old or violate the rules (list vs List, etc). New named tuples should follow PEP-8 can use CapWords convention unless there is a strong reason not to in a particular case.

palaviv · 2016-04-29T16:26:58Z

Thanks for the response Ethan I think that I will leave the tests as they are in the current patch.

No doubt, there are exceptions to the rule in the standard library which is less consistent than we might like: "stat_result". That said, stat_result is a structseq and many C type names are old or violate the rules (list vs List, etc). New named tuples should follow PEP-8 can use CapWords convention unless there is a strong reason not to in a particular case.

I actually thought we should keep on consistency with other "result" like objects. I can see your point about new named tuples that should follow PEP-8 and DirEntry is an example of new "result" class that follow PEP-8.
What names do you suggest? Maybe DirInfo and FDirInfo?

giampaolo · 2017-04-08T21:40:53Z

Should we have concerns about performances? Accessing a namedtuple value is almost 4x times slower compared to a plain tuple [1] and os.walk() may iterate hundreds of times.

http://stackoverflow.com/questions/2646157/what-is-the-fastest-to-access-struct-like-object-in-python

rhettinger · 2017-04-09T06:08:56Z

I would expect that the field access time is inconsequential compared to just about every other aspect of os.walk().

serhiy-storchaka · 2017-04-09T07:28:29Z

namedtuple's attribute access was optimized in recent years. In 3.7 it is 30% faster than in 3.4. So now it is only 3x times slower compared to a plain tuple. On other hand, os.walk() and os.fwalk() was optimized too. In 3.7 they are up to 3.5x times faster than in 3.4 (with hot caches). I didn't make measurements, but I expect that using namedtuples with os.walk() can decrease its performance at least by few percents.

My main concern is that this feature will increase the complexity of the documentation of the os module (very little) and may encourage writing less clear code (but this is just my own preference, others can found new style more clear).

serhiy-storchaka · 2017-04-09T07:29:03Z

s/at least/at most/

rhettinger · 2017-04-10T00:55:45Z

There doesn't seem to be a consensus that the proposal is a net win. Serhiy made a persuasive argument that the added complexity isn't worth it.

I'll leave this open for a day or two so that anyone else can make their case. Otherwise, I'll mark this as closed/rejected.

palaviv mannequin added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 26, 2016

pppery mannequin changed the title ~~os.walk and os.fwalk yield namedtuple instead of tuple~~ Make os.walk and os.fwalk yield namedtuple instead of tuple May 4, 2016

rhettinger self-assigned this Apr 9, 2017

rhettinger closed this as completed Apr 11, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make os.walk and os.fwalk yield namedtuple instead of tuple #71047

Make os.walk and os.fwalk yield namedtuple instead of tuple #71047

palaviv mannequin commented Apr 26, 2016

palaviv mannequin commented Apr 26, 2016

ethanfurman commented Apr 26, 2016

rhettinger commented Apr 28, 2016

serhiy-storchaka commented Apr 28, 2016

palaviv mannequin commented Apr 29, 2016

ethanfurman commented Apr 29, 2016

rhettinger commented Apr 29, 2016

palaviv mannequin commented Apr 29, 2016

giampaolo commented Apr 8, 2017

rhettinger commented Apr 9, 2017

serhiy-storchaka commented Apr 9, 2017

serhiy-storchaka commented Apr 9, 2017

rhettinger commented Apr 10, 2017

Make os.walk and os.fwalk yield namedtuple instead of tuple #71047

Make os.walk and os.fwalk yield namedtuple instead of tuple #71047

Comments

palaviv mannequin commented Apr 26, 2016

palaviv mannequin commented Apr 26, 2016

ethanfurman commented Apr 26, 2016

rhettinger commented Apr 28, 2016

serhiy-storchaka commented Apr 28, 2016

palaviv mannequin commented Apr 29, 2016

ethanfurman commented Apr 29, 2016

rhettinger commented Apr 29, 2016

palaviv mannequin commented Apr 29, 2016

giampaolo commented Apr 8, 2017

rhettinger commented Apr 9, 2017

serhiy-storchaka commented Apr 9, 2017

serhiy-storchaka commented Apr 9, 2017

rhettinger commented Apr 10, 2017