Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fnmatch.filterfalse function #74598

Open
stevendaprano opened this issue May 20, 2017 · 17 comments
Open

Add fnmatch.filterfalse function #74598

stevendaprano opened this issue May 20, 2017 · 17 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@stevendaprano
Copy link
Member

BPO 30413
Nosy @brettcannon, @rhettinger, @vstinner, @stevendaprano, @serhiy-storchaka, @wm75
Dependencies
  • bpo-30415: Improve fnmatch testing
  • Files
  • filterfalse.diff
  • filterfalse.alternate_patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2017-05-20.13:19:02.837>
    labels = ['3.7', 'type-feature', 'library']
    title = 'Add fnmatch.filterfalse function'
    updated_at = <Date 2019-08-21.16:32:30.493>
    user = 'https://github.com/stevendaprano'

    bugs.python.org fields:

    activity = <Date 2019-08-21.16:32:30.493>
    actor = 'gvanrossum'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2017-05-20.13:19:02.837>
    creator = 'steven.daprano'
    dependencies = ['30415']
    files = ['46879', '46880']
    hgrepos = []
    issue_num = 30413
    keywords = ['patch']
    message_count = 17.0
    messages = ['294029', '294031', '294036', '294038', '294039', '294041', '294078', '294110', '294111', '294132', '294133', '294510', '350056', '350062', '350066', '350090', '350093']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'rhettinger', 'vstinner', 'steven.daprano', 'serhiy.storchaka', 'wolma', 'Roee Nizan']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue30413'
    versions = ['Python 3.7']

    @stevendaprano
    Copy link
    Member Author

    There has been some discussion on Python-Ideas about adding fnmatch.filter_false to complement filter, for when you want to ignore matching files rather than non-matching ones.

    https://mail.python.org/pipermail/python-ideas/2017-May/045694.html

    I don't believe there have been any strong objections, and work-arounds have a considerable performance penalty (according to the OP, I have not confirmed that).

    Here's a patch which refactors filter and adds filter_false.

    Sorry I haven't got my head around the brave new world on Github, so I'm going old school here with a diff :-)

    @stevendaprano stevendaprano added 3.7 (EOL) end of life type-feature A feature request or enhancement labels May 20, 2017
    @serhiy-storchaka
    Copy link
    Member

    In general LGTM. I left few minor comments on Rietveld. Needed the versionadded directive, Misc/NEWS and What's New entries.

    But shouldn't the new function have name "filterfalse" for parallel with itertools.filterfalse? And this would better match the style of the fnmatch module, fnmatchcase doesn't contain an underscore.

    Tests for fnmatch.filter are too poor. I think we should add more tests (this is other issue).

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented May 20, 2017

    Hi,
    seems I had the same thoughts as you, Steven. I had started working on a patch independently yesterday, but after making my changes to fnmatch itself, I found I had too many other things to do to address unittests and doc changes to turn this into a real patch - so thank you for spending time on all of this.
    I just downloaded your patch and merged it with mine because I think my version of fnmatch.py may be simpler and slightly faster (though like you I didn't run performance tests). Feel free to do whatever you like with this alternate version - it's not that there is much new in it to take credit for :)

    Another thing I noted: fnmatch takes a rather unusual approach to determine whether normcase is NOP or not. It imports posixpath only to see if it is the same as os.path. That looks pretty weird and wasteful to me (especially if you are running this on Windows and are effectively importing posixpath just for fun then). I think it would be better to just check os.name instead (like pathlib does for example). I moved the corresponding check to the top of the module to make this easier to address, should that be of interest. I'm also using the result of the check in fnmatch.fnmatch because I don't see any reason why you wouldn't want to benefit from it there as well.

    @serhiy-storchaka
    Copy link
    Member

    Your patch makes filter() returning normalized names. This may be not what is expected.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented May 20, 2017

    Does it? I thought it does so only if normalize_case is True.
    Did I miss something?

    @serhiy-storchaka
    Copy link
    Member

    Yes, it does so only if normalize_case is True. Currently fnmatch.filter() returns non-normalized names.

    @serhiy-storchaka serhiy-storchaka added the stdlib Python modules in the Lib dir label May 20, 2017
    @stevendaprano
    Copy link
    Member Author

    I'm happy for you to change the name to filterfalse.

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented May 21, 2017

    @serhiy: my bad! I just hadn't realized this behavior of the original.
    With this requirement I cannot see any simpler solution than Steven's.

    Some other questions though to everyone involved:

    1. what do you think about "os.path is posixpath" vs just checking os.name == 'posix' as I suggested earlier?

    2. speaking of missing functionality in filter:
      What do you think of a keyword argument like 'case' to both filter and filterfalse that, when True, would make these functions behave equivalently to
      [n for n in names if fnmatchcase(n, pattern)]
      The default would be False, of course. I know this would be inconsistent in terms of argument vs separate functions, but it is easy to explain and learn and without it Windows users of filter/filterfalse would really suffer from the much poorer performance due to the slow normcase call (even slower now with the new fspath protocol) even if they pass in normalized names already.

    3. not directly related to this issue, but I came across it in this context:

    isn't normcase in both posixpath and ntpath doing isinstance(str, bytes) checks that are redundant with os.fspath? Is this oversight or am I missing something again?

    @serhiy-storchaka
    Copy link
    Member

    1. what do you think about "os.path is posixpath" vs just checking os.name == 'posix' as I suggested earlier?

    Currently os.path is posixpath if and only if os.name == 'posix'. But this can be changed in future. I think it is safer to the current check. The posixpath module doesn't contain classes, enums or namedtuples, it's import is fast.

    And it would be safer to keep the check in the function rather than move it at module level.

    What do you think of a keyword argument like 'case' to both filter and filterfalse that, when True, would make these functions behave equivalently to
    [n for n in names if fnmatchcase(n, pattern)]

    This is a different issue. fnmatch.filter() was added for speeding up glob.glob() (see bpo-409973). I don't know whether there is a good use case for filtercase().

    isn't normcase in both posixpath and ntpath doing isinstance(str, bytes) checks that are redundant with os.fspath?

    Good catch! It seems to me that they are redundant. Please open a new issue for this.

    @serhiy-storchaka serhiy-storchaka changed the title Add fnmatch.filter_false function Add fnmatch.filterfalse function May 21, 2017
    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented May 22, 2017

    Good catch! It seems to me that they are redundant. Please open a new issue for this.

    done: bpo-30427

    @wm75
    Copy link
    Mannequin

    wm75 mannequin commented May 22, 2017

    Yet another thing I just realized (sorry for being so annoying):

    With os.normcase calling os.fspath in 3.6+ it is not really a NOP anymore even on posix. As a consequence, you can now do some weird things with fnmatch: in all cases, and only in these, where the module *does* call normcase you can pass in Path-like objects as the names.
    This works with fnmatch.fnmatch on all platforms, never for fnmatchcase, and platform-dependently (on Windows, but not on posix platforms) for filter/filterfalse.
    It's mostly that last case that worries me because it makes it easy to accidentally write code that is not platform-independent.

    Also note that you can also pass a Path-like object as pat everywhere except in fnmatchcase.

    @brettcannon
    Copy link
    Member

    I don't think os.normcase() calling os.fspath() needs to be a worry. Pragmatically it's just getting a str/bytes from an object that knows how to return a str/bytes for a path. IOW it's no different than os.normcase(str(path)) and so embedding the call such that it's not a flat-out no-op shouldn't be a general concern.

    @RoeeNizan
    Copy link
    Mannequin

    RoeeNizan mannequin commented Aug 21, 2019

    Was this abandoned? Is there a reason not to accept this?

    @vstinner
    Copy link
    Member

    Rather than adding a new function, why not adding a parameter like sort(key=func, reverse=True)? Something like fnmatch.filterfalse(pat, invert=True)?

    The Unix grep command has a --invert-match option:

       -v, --invert-match
              Invert the sense of matching, to select non-matching lines.
    

    The Unix find and test commands use "!" to invert a condition. Example:

    find ! -name "*.py" # list files which doesn't match ".py"
    test ! -e setup.py # true if setup.py doesn't exist

    @stevendaprano
    Copy link
    Member Author

    On Wed, Aug 21, 2019 at 10:51:02AM +0000, STINNER Victor wrote:

    Rather than adding a new function, why not adding a parameter like
    sort(key=func, reverse=True)? Something like fnmatch.filterfalse(pat,
    invert=True)?

    Guido argues that as a general rule of thumb, we should avoid "constant
    bool parameters" and prefer seperate functions. For example, we have

    • re.search and re.match, not re.search(start=True)

    • str.find and str.rfind, not str.find(end=False)

    If we typically call the function with a bool constant:

        filter(pat, invert=True)

    rather than a variable or expression

        filter(pat, invert=condition or default)

    that's a hint that the two cases probably should be seperate functions.

    Martin Fowler agrees:

    https://martinfowler.com/bliki/FlagArgument.html

    as does Raymond Chen:

    https://devblogs.microsoft.com/oldnewthing/20060828-18/?p=29953

    (Of course there are cases where it is impractical to avoid bool flags
    -- if a function would otherwise take four flags, we would need sixteen
    functions! -- or there may be other reasons why we might go against that
    design rule.)

    @rhettinger
    Copy link
    Contributor

    Rather than adding a new function, why not adding a parameter

    Guido, the reason iteratools has a separate filter() and filterfalse() tools is because of your API guidance to prefer separate functions rather than having flags in most cases.

    Do you still feel that way and does it apply here? Personally, I would want to have a separate function for fnmatch.filter_false() rather than adding complexity to the API for fnmatch.

    @gvanrossum
    Copy link
    Member

    Yes, I still feel like this and I find it applies here. (Note that the module already has two variants of fnmatch(), so it's nothing new in this context.)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests

    6 participants