Add -g option to %run to glob expand arguments #2165

tkf · 2012-07-18T18:40:55Z

This allows, e.g.:

%run -g script.py *.txt

bfroehle · 2012-07-18T18:57:43Z

IPython/core/magics/execution.py

@@ -48,6 +49,17 @@
 from IPython.utils.timing import clock, clock2
 from IPython.utils.warn import warn, error

+
+def globlist(args):


This could just be replaced with list(map(glob.glob, args)) below.

list doesn't concatenate its argument. If it is expanded.append, it is equivalent, but it's expanded.extend, right? I need the reduce function to do this in one line... I guess you don't want the reduce as IPython needs to support Py3k?

More to the point, code calling reduce is trickier to understand intuitively - that's why it was sequestered into a module for Python 3. I think this function is fine, although I'd describe the result as 'flattened' rather than 'concatenated'.

I'll change the doc

Oh, aha, I missed that we were flattening the lists. My bad. :p

takluyver · 2012-07-18T19:36:46Z

Two questions:

Would it make sense to do this by default? %run generally behaves like a command line, so maybe the user expects glob expansion without any special option.
How does this interact with Python expressions that get evaluated before the magic command runs? E.g. I can do:

In [3]: %run testargv.py {3*4}
['testargv.py', '12']

I think these expressions should be evaluated before the glob expansion, but it's worth checking.

tkf · 2012-07-18T20:04:17Z

I thought it was default and surprised to see it wasn't. I make it optional just because it breaks backward compatibility. I am +1 for making this as default.

I didn't know about the {} expression! This is really cool. I can't find where the evaluation is done by looking at magic functions, but it seems the evaluation is done at the time when the magic command line option is passed to the magic function, so I guess * is recognized as multiplication when it is in {}. I checked something like the following two yield the same output:

%run -g script.py spam*.egg
%run -g script.py {"spam" + "*" * 1 + ".egg"}

bfroehle · 2012-07-18T20:53:47Z

Beware that the current pull request is going to eat arguments that aren't filenames:

>>> import glob
>>> glob.glob('crap')
[]
>>>

It seems to me that we should offer two modes in %run which correspond, roughly, to the shell=True and shell=False in subprocess.Popen.

takluyver · 2012-07-18T20:56:32Z

Yep, I've just checked, the expression evaluation is done before the specific magic function is called. It's the call to var_expand on this line: https://github.com/ipython/ipython/blob/master/IPython/core/interactiveshell.py#L2077

Re making it the default: let's ask the user list what they think.

@bfroehle well spotted. We should check for that case and add the original argument into the list.

bfroehle · 2012-07-18T21:00:03Z

So something like:

def glob_args(args):
    out = []
    for arg in args:
        out.extend(glob.glob(arg) or [arg])
    return out

tkf · 2012-07-18T21:13:44Z

Oops, sorry I didn't notice. Thanks, @bfroehle.

I checked how shell behave when it cannot find the glob match.

bash $ ls spam*
ls: cannot access spam*: No such file or directory

zsh % ls spam*
zsh: no matches found: spam*

I like the way zsh acts. As we pass the explicit argument to turn on and off, I think raising error is better. Because then the choice is explicit. So, I suggest:

def globlist(args):
    expanded = []
    pattern = set('*[]?!')
    for a in args:
        if pattern & set(a):
            matches = glob(a)
            if not matches:
                raise RuntimeError("no matches found: {0}".format(a))
            expanded.extend(matches)
        else:
            expanded.append(a)
    return expanded

(yea, it's uglier...)

takluyver · 2012-07-18T21:22:07Z

I'd be inclined to go with the way bash behaves:

It means you can easily pass an argument containing a *, and there are occasional reasons to do that.
Far more people are familiar with bash semantics than zsh semantics.
It's simpler, and simple code is always better (e.g. your set-matching code doesn't account for escaped characters, like \*).

bfroehle · 2012-07-18T21:28:21Z

I'm going to agree with @takluyver here, regarding assuming bash-style by default.

If you were going to go with the other format, I'd use the existing glob.has_magic function.

import glob
def glob_args(args):
    # fix the name and docstring
    out = []
    for a in args:
        if glob.has_magic(a):
            matches = glob.glob(a)
            if not matches:
                raise UsageError("No matches found: %s" % a)
            out.extend(matches)
        else:
            out.append(a)
    return out

tkf · 2012-07-18T21:39:03Z

Well, you can pass * explicitly and more less easily by just adding \, provided that globlist handles escaping. But I agree that most people are familiar with bash, so I'd change it as @bfroehle suggested.

tkf · 2012-07-18T21:41:02Z

Oh, is it in glob module? I was looking for that in fnmatch module!

tkf · 2012-07-18T21:43:43Z

BTW, I was surprised see the definition:

magic_check = re.compile('[*?[]')

def has_magic(s):
    return magic_check.search(s) is not None

Of course,

In [11]:
glob.has_magic(r'\*')
Out [11]:
True

takluyver · 2012-07-18T23:17:25Z

IPython/core/magics/execution.py

+    """
+    expanded = []
+    for a in args:
+        expanded.extend(glob((a) or [a]))


Careful with brackets here - you're doing glob((a) or [a]), but it should be glob(a) or [a].

minrk · 2012-07-19T02:46:39Z

re: preferences, I agree with @takluyver. I would expect bash-style glob expanding to be the only behavior, without needing a flag.

tkf · 2012-07-19T10:27:58Z

@takluyver Thanks. The commit 1f7c20b3241398338d5c34940c098cfceb827f89 is amended to pretend that I am not stupid :)

takluyver · 2012-07-19T10:38:10Z

That looks better. Still to do:

On by default seems to be the consensus so far. It was pointed out on the mailing list that Windows shells don't do glob expansion, but I think cross-platform consistency is preferable to disabling it on one platform.
It should have a test to ensure it keeps working. TemporaryDirectory will be useful here.

tkf · 2012-07-19T10:40:33Z

Regarding glob escaping. In shells, you can pass a string to program without expanding it by quoting it (e.g., '*'). I guess this is harder to implement than backslash escaping \*. I looked up shlex document but I couldn't find a way to split a string and preserve the quotation:

In [4]:
shlex.split("a 'b' c")
Out [4]:
['a', 'b', 'c']

I guess we should mention the difference from shell glob expansion somewhere in the docstring unless implementing full emulation.

takluyver · 2012-07-19T10:46:29Z

Yes, it should go in the docs somewhere. Although I doubt many people
will actually look it up. Even the docstring for %run is pretty long
already.

tkf · 2012-07-19T10:59:53Z

I guess they will look up when they find that their script act in an unexpected way. (BTW, this is why I prefer the zsh way.)

bfroehle · 2012-07-20T22:55:06Z

IPython/core/tests/test_magic.py

+        # create files
+        for p in getpaths(filenames):
+            open(p, 'w').close()
+


Probably not necessary to make this a function just to call it once.

for fname in filenames: open(os.path.join(td, fname), 'w').close()

getpaths is used also in assert_match (please see below)

Actually, I see now that it was used in assert_match. Regardless, it'd be a lot easier just to chdir into the temporary directory.

save = os.getcwdu() with TemporaryDirectory() as td: os.chdir(td) # Create empty files. for fname in filenames: open(fname, 'w').close() assert ... os.chdir(save)

If you want to do it safely, you should put it in the context manager or try-finally clause. I'd put chdir in the TemporaryDirectory, if I need to do that. Should I?

Yes, I guess you should put it in try / finally block.

I prefer context manager because it does not contaminate the namespace of the try block. But anyway, it's done.

@bfroehle

as @bfroehle suggested

tkf · 2012-07-21T00:01:22Z

In the last commit, I made the expansion default because it looks like people prefer this way. Don't worry, if the situation is reversed, I will just remove the commit.

takluyver · 2012-07-23T13:27:10Z

globlist and its test probably belong in utils - maybe utils.path.

Other than that, I think this is looking pretty good.

tkf · 2012-07-23T15:09:29Z

Done!

tkf · 2012-08-02T14:21:53Z

I added doctest for %run option parser. I noticed that we need double escape for escaping glob patterns, because shlex strips unused backslashes:

In [2]:
shlex.split(r"\'\*\\", posix=True)
Out [2]:
["'*\\"]

In [3]:
shlex.split(r"\'\*\\", posix=False)
Out [3]:
["\\'\\*\\\\"]

http://docs.python.org/library/shlex.html#parsing-rules

So, I changed the docstring a little bit saying that you need two backslashes to escape glob patterns. But I don't know if you like this.

takluyver · 2012-08-02T14:29:56Z

Hmm, it's not ideal to have to double the backslash. Is there any sensible way to avoid that requirement? I'll have a look later as well.

takluyver · 2012-08-02T14:33:26Z

Test results for commit 66727cb merged into master
Platform: linux2

python2.7: OK (libraries not available: oct2py pymongo wx wx.aui)
python3.2: OK (libraries not available: oct2py pymongo wx wx.aui)

Not available for testing: python2.6

tkf · 2012-08-04T00:51:25Z

I found a super hacky way (though it just uses interfaces described in the manual) to get the original string of each token:

In [30]:
def record_returns(original):
    returns = []
    def wrapper(*args, **kwds):
        ret = original(*args, **kwds)
        returns.append(ret)
        return ret
    return (wrapper, returns)

In [34]:
class Proxy(object):
    pass

In [35]:
class MyShlex2(shlex.shlex):

    def __init__(self, *args, **kwds):
        shlex.shlex.__init__(self, *args, **kwds)
        instream = self.instream
        self.instream = Proxy()
        self.instream.readline = instream.readline
        (self.instream.read, self.returns) = record_returns(instream.read)
        self.raw_tokens = []

    def read_token(self):
        ret = shlex.shlex.read_token(self)
        self.raw_tokens.append("".join(self.returns))
        self.returns[:] = []
        return ret

In [36]:
string = r'"a\""'
lex = MyShlex2(string, posix=True)
lex.whitespace_split = True
lex.commenters = ''
list(lex)
Out [36]:
['a"']

In [37]:
lex.raw_tokens
Out [37]:
['"a\\""', '']

So, if lex.raw_tokens[i][0] in ("'", '"') then i-th element returned by list(lex) is quoted. But I guess this is too hacky...

takluyver · 2012-08-05T14:35:16Z

Nice going! I'm in two minds about whether behaving like typical shells warrants the extra complexity. Maybe others will chime in.

I think it could also be a little simpler, if instead of record_returns and Proxy we did something like:

class StreamProxy(object):
    def __init__(self, stream):
        self.stream = stream
        self.chunks_read = []

    def read(self, *args, **kwargs):
        ret = self.stream.read(*args, **kwargs)
        self.chunks_read.append(ret)
        return ret

Then the dance in MyShlex2.__init__ becomes self.instream = StreamProxy(self.instream). Note that I haven't tested this, I'm just coding off the top of my head.

tkf · 2012-08-05T15:01:39Z

Yea, I think specific proxy class makes it simpler. I would like to know if this complexity is appropriate here.

tkf · 2012-08-05T17:58:54Z

I forgot to mention that the approach with custom shlex class does not solve the problem with the double backslash. I think it's much difficult to solve this problem.

takluyver · 2012-08-05T18:59:48Z

Hmmm, that's annoying. Maybe we should just ignore backslashes and tell people to use quotes to escape wildcards. Otherwise we'll probably end up writing a parser ourselves.

bfroehle · 2012-08-14T16:01:53Z

The blackslash issue can easily be worked around, we'd just have to write a new arg_split which just sets lex.escape = ''. It's a bit surprising to me that we that this isn't the default in our use anyway. Well, not really, as this would mess up splitting things like "My string with a \" instead of it".

However, I think we need to push the reset button here and first come up with a defined target first and then work towards implementation. In addition we should discuss the current Windows vs. Linux split and whether it is worth maintaining.

As a naive goal, I'd suggest more or less the following mantra: %run [options] filename [args] should function more or less equivalently to $ python [options] filename [args], except that the list of options will be different.

tkf · 2012-08-14T16:32:47Z

Yes, making [args] acts equivalently in %run and python has been my target too.

I'd like to know...

What we do when we cannot implement the shell-equivalent splitting, but can offer a workaround. The double backslash is a good example.
How complex our code can go to make [args] equivalent in %run. I guess implementing our own parser is way too much. But how is hacking stdlib shlex.shlex class?
Whether IPython should behave differently depending on OS (Windows/Linux).

takluyver · 2012-08-14T17:00:58Z

On the last question, my vote is to make it work the same way - the POSIX
way - regardless of OS.Many of our users span more than one OS, and I think
serious command line use is much more common on the *nix platforms than
Windows.

jdmarch · 2012-08-14T17:08:37Z

I agree, modulo flexibility on windows filename paths

fperez · 2012-08-16T03:53:05Z

On Tue, Aug 14, 2012 at 9:01 AM, Bradley M. Froehle <
notifications@github.com> wrote:

As a naive goal, I'd suggest more or less the following mantra: %run
[options] filename [args] should function more or less equivalently to $
python [options] filename [args], except that the list of options will be
different.

+1. That has been the intent since the beginning, and to a first
approximation it indeed works that way (though not perfectly, of course, or
we wouldn have this issue)

tkf · 2012-08-28T17:47:56Z

Any comment on my first and second questions?

To repeat, I think there is no way to get rid of double backslash unless we have a shell parser with glob support.

Note that if we use the custom shlex class I suggested, I think user can write command line argument for %run which is compatible with shell, provided that slashes are used only for space and slashes and quotes are used to escape glob patterns. For example, this will yield the same result:

python script.py '*' * words\ with\ spaces\ and\ slashes\\
%run script.py '*' * words\ with\ spaces\ and\ slashes\\

whereas this won't:

python script.py \* *   # \* will not be expanded
%run script.py \* *     # \* will be expanded

Without the custom shlex class, the only way to passing * and expanding other arguments is double backslash:

%run script.py \\* *

I don't have strong opinion on adding the custom shlex class. It improves the situation slightly, but I'm OK without it.

What do you think?

bfroehle · 2012-09-03T19:08:36Z

@tkf Sorry for letting this one go for so long.

Some thoughts:

I don't think there should be an option to disable the glob featuer. I think it should happen always (like in the shell), but perhaps with using quotation marks of some kind to disable it.
I think that the functionality should probably like in parse_options, to be used as:

opts, arg_lst = self.parse_options(parameter_s, 'nidtN:b:pD:l:rs:T:em:', glob=True)

Is the fundamental problem here that shlex.split doesn't do what we want it to do?

tkf · 2012-09-03T20:55:27Z

If we can't support full glob expansion as in shell, I think it's better to have an option to disable it. For example, you can do something like this to ensure that correct list of files (which may contain glob characters such as *) is passed to your script.

def glob_then_quote(pattern):
    return map(repr, glob(pattern))
%run -G script.py --some-option {glob_then_quote('*')}

Or instead, maybe we can even add new "--append-argv" option to make sure correct list is passed to the script:

%run --append-argv=glob('*') script.py --some-option

If we are going to add glob option to parser, it should be in the argparse based one, right? If so, I suggest to open another issue/PR because you will see a big diff which is unrelated to the main issue here.

Is the fundamental problem here that shlex.split doesn't do what we want it to do?

Yes.

bfroehle · 2012-09-03T21:12:26Z

Yes, you do point out a good workaround here:

import glob

def addquotes(filename):
    """Quote a filename."""
    # See pipes.quote
    return "'" + filename.replace("'", "'\"'\"'") + "'"

def myglob(s):
    return ' '.join(map(addquotes, glob.glob(s)))

%run script.py {myglob('*')}

A real shell, like bash, performs the argument splitting and globbing before forking and calling exec. The executed program is responsible for parsing the arguments.

tkf · 2012-09-27T00:30:39Z

@bfroehle glob expanding in {} does not work well when you expanding file names containing *, if there is no option to disable glob expansion in the run magic. So I think we need the option to disable it as this PR.

Any plan for pulling this PR? I think this PR is still worth pulling although it's not perfect. We can use it until we have perfect solution (a package which does glob + shlex). See my previous comments for the state of this PR.

Carreau · 2012-09-29T12:57:43Z

I didn't follow this PR, but there seem to be quite a lot of work here.
Could anyone more involve take a look and decide wether is is worth merging as is and maybe be refined later ?

Sorry to @tkf for having you waiting so long.

tkf · 2012-10-05T15:49:10Z

BTW, I though it was too dumb to mention, but you can have a perfect split + glob expansion if you run system shell every time like this 'for s in {0}; do printf "%s\\0" $s; done'.format(line). Downside of this is you need something different for windows.

fperez · 2012-10-06T08:04:13Z

FWIW, I'm +1 on this. I don't want that to be the final call, as I haven't been as close to the review as @bfroehle and @takluyver. But the intent of the PR is definitely good, there's tests and @tkf has done a great job in responding to all review. The actual new code is simple, it's kind of unfortunate that it hits such a delicate behavior, because the ratio of review/discussion to new code in this PR is pretty brutal. But sometimes, that's how it has to be done :)

bfroehle · 2012-10-07T17:10:57Z

Thanks @fperez for the review. I think this is ready to go now too. It's not perfect, but it's certainly an improvement and includes tests.

I'm going to merge now and we can make further refinements in later pull requests if needed.

Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.

bfroehle · 2012-10-07T17:14:04Z

@tkf Thanks for your patience, persistence, and willingness to make changes to produce a great result in the end.

tkf · 2012-10-08T07:58:11Z

I was afraid I was pining too much :) Anyway, thanks for the merge!

takluyver · 2012-10-09T08:27:06Z

Unfortunately this seems to have caused new test failures on Windows - see #2477.

Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.

Add -g option to %run to glob expand arguments

a6142b7

bfroehle reviewed Jul 18, 2012
View reviewed changes

Tiny improvement on globlist docstring

4ba4800

takluyver reviewed Jul 18, 2012
View reviewed changes

Do not discard unmatched glob patterns in globlist

832b089

Add test_globlist

1f48c40

bfroehle reviewed Jul 20, 2012
View reviewed changes

tkf added 2 commits July 21, 2012 01:52

Change directory to simplify test_globlist

a5138a3

as @bfroehle suggested

Make glob expansion default in %run magic command

afb25d0

Move globlist and its test under utils.path

a7c450c

Fix unescape_glob: support escaping "\"

2446f12

bfroehle added a commit that referenced this pull request Oct 7, 2012

Merge pull request #2165 from tkf/run-glob

e54a60b

Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.

bfroehle merged commit e54a60b into ipython:master Oct 7, 2012

takluyver mentioned this pull request Oct 9, 2012

Glob expansion tests fail on Windows #2477

Closed

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request ipython#2165 from tkf/run-glob

a058b83

Expand globs (i.e., '*' and '?') in `%run`. Use `-G` to skip.

Add -g option to %run to glob expand arguments #2165

Add -g option to %run to glob expand arguments #2165

Conversation

tkf commented Jul 18, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takluyver commented Jul 18, 2012

tkf commented Jul 18, 2012

bfroehle commented Jul 18, 2012

takluyver commented Jul 18, 2012

bfroehle commented Jul 18, 2012

tkf commented Jul 18, 2012

takluyver commented Jul 18, 2012

bfroehle commented Jul 18, 2012

tkf commented Jul 18, 2012

tkf commented Jul 18, 2012

tkf commented Jul 18, 2012

Choose a reason for hiding this comment

minrk commented Jul 19, 2012

tkf commented Jul 19, 2012

takluyver commented Jul 19, 2012

tkf commented Jul 19, 2012

takluyver commented Jul 19, 2012

tkf commented Jul 19, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkf commented Jul 21, 2012

takluyver commented Jul 23, 2012

tkf commented Jul 23, 2012

tkf commented Aug 2, 2012

takluyver commented Aug 2, 2012

takluyver commented Aug 2, 2012

tkf commented Aug 4, 2012

takluyver commented Aug 5, 2012

tkf commented Aug 5, 2012

tkf commented Aug 5, 2012

takluyver commented Aug 5, 2012

bfroehle commented Aug 14, 2012

tkf commented Aug 14, 2012

takluyver commented Aug 14, 2012

jdmarch commented Aug 14, 2012

fperez commented Aug 16, 2012

tkf commented Aug 28, 2012

bfroehle commented Sep 3, 2012

tkf commented Sep 3, 2012

bfroehle commented Sep 3, 2012

tkf commented Sep 27, 2012

Carreau commented Sep 29, 2012

tkf commented Oct 5, 2012

fperez commented Oct 6, 2012

bfroehle commented Oct 7, 2012

bfroehle commented Oct 7, 2012

tkf commented Oct 8, 2012

takluyver commented Oct 9, 2012