-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a function to escape metacharacters in glob/fnmatch #52649
Comments
Have this problem in python 2.5.4 under windows. glob.glob("c:\abc\afolderwith[test]\") returns empty list |
When you do : Ofcourse they do not exist so it returns empty list 06:35:05 l0nwlf-MBP:Desktop $ ls -R test >>> glob.glob('/Users/l0nwlf/Desktop/test[123]/*')
['/Users/l0nwlf/Desktop/test1/alpha', '/Users/l0nwlf/Desktop/test1/beta', '/Users/l0nwlf/Desktop/test1/gamma'] As you can see, by giving the argument test[123] it looked for test1, test2, test3. Since test1 existed, it gave all the files present within it. |
See the explanation at http://docs.python.org/library/fnmatch.html#module-fnmatch , which uses the same rules. |
Ok, what if the name of the directory contains "[]" characters? What is the escape string for that? |
The documentation for fnmatch.translate, which is what ultimately gets called, says: If you want to see this changed, you could open a feature request. If you have a patch, that would help! You probably want to research what the Unix shells use for escaping globs. |
glob module does not provide what you want. os.listdir("c:\abc\afolderwith[test]") 07:02:52 l0nwlf-MBP:Desktop $ ls -R test\[123\]/
1 2 3
>>> os.listdir('/Users/l0nwlf/Desktop/test[123]')
['1', '2', '3'] Changing type to 'Feature Request' |
Well, the listdir doesn't support "wildcard", for example, On Wed, Apr 14, 2010 at 6:34 PM, Shashwat Anand <report@bugs.python.org>wrote:
|
Well, the listdir doesn't support "wildcard", for example, listdir("*.app"). I know the glob is kind of unix shell style expanding, but my program is running under windows, it's my tiny script to walk through a huge directory in my NAS. And there are many directories named with "[]" and "()" characters amid. May be the only way is to write a filter on the listdir. |
You repeated the same comment twice and added an 'unnamed' file. I assume you did it by mistake. |
Shouldn't the title be updated to indicate the fnmatch is the true source of the behavior (I'm basing this on http://docs.python.org/library/glob.html indicating the fnmatch is invoked by glob). I'm not using glob, but fnmatch in my attempt to find filenames that look like "Ajax_[version2].txt". If nothing else, it would have helped me if the documentation would state whether or not the brackets could be escaped. It doesn't appear from my tests (trying "Ajax_\[version2\].txt" and "Ajax_\\[version2\\].txt") that 'escaping' is possible, but if the filter pattern gets turned into a regular expression, I think escaping *would* be possible. Is that a reasonable assumption? I'm running 2.5.1 under Windows, and this is my first ever post to the bugs list. |
Following up... When I looked at: did not see this statement appear anywhere. Would this absence be because someone is working on making this enhancement? |
I don't think so. That quote came from the docstring for fnmatch.translate. >>> help(fnmatch.translate)
Help on function translate in module fnmatch: translate(pat)
Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters. |
The 3.1.2 doc for fnmatch.translate no longer says "There is no way to quote meta-characters." If that is still true (no quoting method is given that I can see), then that removal is something of a regression. |
The note about no quoting meta-chars is in the docstring for fnmatch.translate, not the documentation. I still see it in 3.1. I have a to-do item to add this to the actual documentation. I'll add an issue. |
As a workaround, it is possible to make every glob character a character set of one character (wrapping it with [] ). The gotcha here is that you can't just use multiple replaces because you would escape the escape brackets. Here is a function adapted from [1]: def escape_glob(path):
transdict = {
'[': '[[]',
']': '[]]',
'*': '[*]',
'?': '[?]',
}
rc = re.compile('|'.join(map(re.escape, transdict)))
return rc.sub(lambda m: transdict[m.group(0)], path) [1] http://www.daniweb.com/software-development/python/code/216636 |
i m agree with answer number 6. the resolution mentioned is quite easy and very effectve |
The attached patch adds support for '\\' escaping to fnmatch, and consequently to glob. |
I have comments on the patch but a review link does not appear. Could you update your clone to latest default revision and regenerate the patch? Thanks. |
Noblesse oblige :) |
This is a backward incompatible change. For example glob.glob(r'C:\Program Files\*') will be broken. As flacs says a way to escape metacharacters in glob/fnmatch already exists. If someone want to match literal name "Ajax_[version2].txt" it should use pattern "Ajax_[[]version2].txt". Documentation should explicitly mentions such way. It will be good also to add new fnmatch.escape() function. |
Here is a patch which add fnmatch.escape() function. |
I am not sure if escape() should support bytes. translate() doesn't. |
I think the escaping workaround should be documented in the glob and/or fnmatch docs. This way users can simply do: import glob
glob.glob("c:\abc\afolderwith[[]test]\*") rather than import glob
import fnmatch
glob.glob(fnmatch.escape("c:\abc\afolderwith[test]\") + "*") The function might still be useful with patterns constructed programmatically, but I'm not sure how common the problem really is. |
See bpo-16240. This issue left for enhancement. |
Patch updated (thanks Ezio for review and comments). |
The workaround is now documented. |
It is good, if stdlib has function for escaping any special characters, even if this function is simple. There are already escape functions for re and sgml/xml/html. Private function glob.glob1 used in Lib/msilib and Tools/msi to prevent unexpected globbing in parent directory name. |
I've attached fnmatch_implementation.py, which is a simple pure-Python implementation of the fnmatch function. It's not as susceptible to catastrophic backtracking as the current re-based one. For example: fnmatch('a' * 50, '*a*' * 50) completes quickly. |
I think it should be a separate issue. |
Escaping for glob on Windows should not be such trivial. Special characters in the drive part have no special meaning and should not be escaped. I.e. Here is a patch for glob.escape(). |
Could anyone please review the patch before feature freeze? |
Updated patch addresses Ezio's and Eric's comments. |
Updated patch addresses Eric's comment. |
Looks good to me. |
New changeset 5fda36bff39d by Serhiy Storchaka in branch 'default': |
Thank you Ezio and Eric for your reviews. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: