New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add -g option to %run to glob expand arguments #2165
Conversation
@@ -48,6 +49,17 @@ | |||
from IPython.utils.timing import clock, clock2 | |||
from IPython.utils.warn import warn, error | |||
|
|||
|
|||
def globlist(args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be replaced with list(map(glob.glob, args))
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list doesn't concatenate its argument. If it is expanded.append
, it is equivalent, but it's expanded.extend
, right? I need the reduce function to do this in one line... I guess you don't want the reduce as IPython needs to support Py3k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More to the point, code calling reduce
is trickier to understand intuitively - that's why it was sequestered into a module for Python 3. I think this function is fine, although I'd describe the result as 'flattened' rather than 'concatenated'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, aha, I missed that we were flattening the lists. My bad. :p
Two questions:
I think these expressions should be evaluated before the glob expansion, but it's worth checking. |
I thought it was default and surprised to see it wasn't. I make it optional just because it breaks backward compatibility. I am +1 for making this as default. I didn't know about the
|
Beware that the current pull request is going to eat arguments that aren't filenames: >>> import glob
>>> glob.glob('crap')
[]
>>> It seems to me that we should offer two modes in %run which correspond, roughly, to the |
Yep, I've just checked, the expression evaluation is done before the specific magic function is called. It's the call to Re making it the default: let's ask the user list what they think. @bfroehle well spotted. We should check for that case and add the original argument into the list. |
So something like: def glob_args(args):
out = []
for arg in args:
out.extend(glob.glob(arg) or [arg])
return out |
Oops, sorry I didn't notice. Thanks, @bfroehle. I checked how shell behave when it cannot find the glob match.
I like the way zsh acts. As we pass the explicit argument to turn on and off, I think raising error is better. Because then the choice is explicit. So, I suggest: def globlist(args):
expanded = []
pattern = set('*[]?!')
for a in args:
if pattern & set(a):
matches = glob(a)
if not matches:
raise RuntimeError("no matches found: {0}".format(a))
expanded.extend(matches)
else:
expanded.append(a)
return expanded (yea, it's uglier...) |
I'd be inclined to go with the way bash behaves:
|
I'm going to agree with @takluyver here, regarding assuming bash-style by default. If you were going to go with the other format, I'd use the existing import glob
def glob_args(args):
# fix the name and docstring
out = []
for a in args:
if glob.has_magic(a):
matches = glob.glob(a)
if not matches:
raise UsageError("No matches found: %s" % a)
out.extend(matches)
else:
out.append(a)
return out |
Well, you can pass |
Oh, is it in glob module? I was looking for that in fnmatch module! |
BTW, I was surprised see the definition: magic_check = re.compile('[*?[]')
def has_magic(s):
return magic_check.search(s) is not None Of course, In [11]:
glob.has_magic(r'\*')
Out [11]:
True |
""" | ||
expanded = [] | ||
for a in args: | ||
expanded.extend(glob((a) or [a])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with brackets here - you're doing glob((a) or [a])
, but it should be glob(a) or [a]
.
re: preferences, I agree with @takluyver. I would expect bash-style glob expanding to be the only behavior, without needing a flag. |
@takluyver Thanks. The commit 1f7c20b3241398338d5c34940c098cfceb827f89 is amended to pretend that I am not stupid :) |
That looks better. Still to do:
|
Regarding glob escaping. In shells, you can pass a string to program without expanding it by quoting it (e.g.,
I guess we should mention the difference from shell glob expansion somewhere in the docstring unless implementing full emulation. |
Yes, it should go in the docs somewhere. Although I doubt many people |
I guess they will look up when they find that their script act in an unexpected way. (BTW, this is why I prefer the zsh way.) |
# create files | ||
for p in getpaths(filenames): | ||
open(p, 'w').close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not necessary to make this a function just to call it once.
for fname in filenames:
open(os.path.join(td, fname), 'w').close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getpaths is used also in assert_match (please see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I see now that it was used in assert_match
. Regardless, it'd be a lot easier just to chdir into the temporary directory.
save = os.getcwdu()
with TemporaryDirectory() as td:
os.chdir(td)
# Create empty files.
for fname in filenames:
open(fname, 'w').close()
assert ...
os.chdir(save)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to do it safely, you should put it in the context manager or try-finally clause. I'd put chdir in the TemporaryDirectory, if I need to do that. Should I?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess you should put it in try
/ finally
block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer context manager because it does not contaminate the namespace of the try block. But anyway, it's done.
In the last commit, I made the expansion default because it looks like people prefer this way. Don't worry, if the situation is reversed, I will just remove the commit. |
globlist and its test probably belong in utils - maybe utils.path. Other than that, I think this is looking pretty good. |
Done! |
I added doctest for %run option parser. I noticed that we need double escape for escaping glob patterns, because shlex strips unused backslashes: In [2]:
shlex.split(r"\'\*\\", posix=True)
Out [2]:
["'*\\"]
In [3]:
shlex.split(r"\'\*\\", posix=False)
Out [3]:
["\\'\\*\\\\"] http://docs.python.org/library/shlex.html#parsing-rules So, I changed the docstring a little bit saying that you need two backslashes to escape glob patterns. But I don't know if you like this. |
Hmm, it's not ideal to have to double the backslash. Is there any sensible way to avoid that requirement? I'll have a look later as well. |
Test results for commit 66727cb merged into master
Not available for testing: python2.6 |
I found a super hacky way (though it just uses interfaces described in the manual) to get the original string of each token: In [30]:
def record_returns(original):
returns = []
def wrapper(*args, **kwds):
ret = original(*args, **kwds)
returns.append(ret)
return ret
return (wrapper, returns)
In [34]:
class Proxy(object):
pass
In [35]:
class MyShlex2(shlex.shlex):
def __init__(self, *args, **kwds):
shlex.shlex.__init__(self, *args, **kwds)
instream = self.instream
self.instream = Proxy()
self.instream.readline = instream.readline
(self.instream.read, self.returns) = record_returns(instream.read)
self.raw_tokens = []
def read_token(self):
ret = shlex.shlex.read_token(self)
self.raw_tokens.append("".join(self.returns))
self.returns[:] = []
return ret
In [36]:
string = r'"a\""'
lex = MyShlex2(string, posix=True)
lex.whitespace_split = True
lex.commenters = ''
list(lex)
Out [36]:
['a"']
In [37]:
lex.raw_tokens
Out [37]:
['"a\\""', ''] So, if |
Nice going! I'm in two minds about whether behaving like typical shells warrants the extra complexity. Maybe others will chime in. I think it could also be a little simpler, if instead of
Then the dance in |
Yea, I think specific proxy class makes it simpler. I would like to know if this complexity is appropriate here. |
I forgot to mention that the approach with custom shlex class does not solve the problem with the double backslash. I think it's much difficult to solve this problem. |
Hmmm, that's annoying. Maybe we should just ignore backslashes and tell people to use quotes to escape wildcards. Otherwise we'll probably end up writing a parser ourselves. |
However, I think we need to push the reset button here and first come up with a defined target first and then work towards implementation. In addition we should discuss the current Windows vs. Linux split and whether it is worth maintaining. As a naive goal, I'd suggest more or less the following mantra: |
Yes, making I'd like to know...
|
On the last question, my vote is to make it work the same way - the POSIX |
I agree, modulo flexibility on windows filename paths |
On Tue, Aug 14, 2012 at 9:01 AM, Bradley M. Froehle <
|
Any comment on my first and second questions? To repeat, I think there is no way to get rid of double backslash unless we have a shell parser with glob support. Note that if we use the custom shlex class I suggested, I think user can write command line argument for
whereas this won't:
Without the custom shlex class, the only way to passing
I don't have strong opinion on adding the custom shlex class. It improves the situation slightly, but I'm OK without it. What do you think? |
@tkf Sorry for letting this one go for so long. Some thoughts:
Is the fundamental problem here that |
If we can't support full glob expansion as in shell, I think it's better to have an option to disable it. For example, you can do something like this to ensure that correct list of files (which may contain glob characters such as def glob_then_quote(pattern):
return map(repr, glob(pattern))
%run -G script.py --some-option {glob_then_quote('*')} Or instead, maybe we can even add new "--append-argv" option to make sure correct list is passed to the script: %run --append-argv=glob('*') script.py --some-option If we are going to add glob option to parser, it should be in the argparse based one, right? If so, I suggest to open another issue/PR because you will see a big diff which is unrelated to the main issue here.
Yes. |
Yes, you do point out a good workaround here: import glob
def addquotes(filename):
"""Quote a filename."""
# See pipes.quote
return "'" + filename.replace("'", "'\"'\"'") + "'"
def myglob(s):
return ' '.join(map(addquotes, glob.glob(s)))
%run script.py {myglob('*')} A real shell, like bash, performs the argument splitting and globbing before forking and calling exec. The executed program is responsible for parsing the arguments. |
@bfroehle glob expanding in Any plan for pulling this PR? I think this PR is still worth pulling although it's not perfect. We can use it until we have perfect solution (a package which does glob + shlex). See my previous comments for the state of this PR. |
I didn't follow this PR, but there seem to be quite a lot of work here. Sorry to @tkf for having you waiting so long. |
BTW, I though it was too dumb to mention, but you can have a perfect split + glob expansion if you run system shell every time like this |
FWIW, I'm +1 on this. I don't want that to be the final call, as I haven't been as close to the review as @bfroehle and @takluyver. But the intent of the PR is definitely good, there's tests and @tkf has done a great job in responding to all review. The actual new code is simple, it's kind of unfortunate that it hits such a delicate behavior, because the ratio of review/discussion to new code in this PR is pretty brutal. But sometimes, that's how it has to be done :) |
Thanks @fperez for the review. I think this is ready to go now too. It's not perfect, but it's certainly an improvement and includes tests. I'm going to merge now and we can make further refinements in later pull requests if needed. |
Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.
@tkf Thanks for your patience, persistence, and willingness to make changes to produce a great result in the end. |
I was afraid I was pining too much :) Anyway, thanks for the merge! |
Unfortunately this seems to have caused new test failures on Windows - see #2477. |
Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.
This allows, e.g.: