Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -g option to %run to glob expand arguments #2165

Merged
merged 13 commits into from Oct 7, 2012
Merged

Conversation

tkf
Copy link
Contributor

@tkf tkf commented Jul 18, 2012

This allows, e.g.:

%run -g script.py *.txt

@@ -48,6 +49,17 @@
from IPython.utils.timing import clock, clock2
from IPython.utils.warn import warn, error


def globlist(args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be replaced with list(map(glob.glob, args)) below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list doesn't concatenate its argument. If it is expanded.append, it is equivalent, but it's expanded.extend, right? I need the reduce function to do this in one line... I guess you don't want the reduce as IPython needs to support Py3k?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to the point, code calling reduce is trickier to understand intuitively - that's why it was sequestered into a module for Python 3. I think this function is fine, although I'd describe the result as 'flattened' rather than 'concatenated'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, aha, I missed that we were flattening the lists. My bad. :p

@takluyver
Copy link
Member

Two questions:

  • Would it make sense to do this by default? %run generally behaves like a command line, so maybe the user expects glob expansion without any special option.
  • How does this interact with Python expressions that get evaluated before the magic command runs? E.g. I can do:
In [3]: %run testargv.py {3*4}
['testargv.py', '12']

I think these expressions should be evaluated before the glob expansion, but it's worth checking.

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2012

I thought it was default and surprised to see it wasn't. I make it optional just because it breaks backward compatibility. I am +1 for making this as default.

I didn't know about the {} expression! This is really cool. I can't find where the evaluation is done by looking at magic functions, but it seems the evaluation is done at the time when the magic command line option is passed to the magic function, so I guess * is recognized as multiplication when it is in {}. I checked something like the following two yield the same output:

%run -g script.py spam*.egg
%run -g script.py {"spam" + "*" * 1 + ".egg"}

@bfroehle
Copy link
Contributor

Beware that the current pull request is going to eat arguments that aren't filenames:

>>> import glob
>>> glob.glob('crap')
[]
>>> 

It seems to me that we should offer two modes in %run which correspond, roughly, to the shell=True and shell=False in subprocess.Popen.

@takluyver
Copy link
Member

Yep, I've just checked, the expression evaluation is done before the specific magic function is called. It's the call to var_expand on this line: https://github.com/ipython/ipython/blob/master/IPython/core/interactiveshell.py#L2077

Re making it the default: let's ask the user list what they think.

@bfroehle well spotted. We should check for that case and add the original argument into the list.

@bfroehle
Copy link
Contributor

So something like:

def glob_args(args):
    out = []
    for arg in args:
        out.extend(glob.glob(arg) or [arg])
    return out

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2012

Oops, sorry I didn't notice. Thanks, @bfroehle.

I checked how shell behave when it cannot find the glob match.

bash $ ls spam*
ls: cannot access spam*: No such file or directory

zsh % ls spam*
zsh: no matches found: spam*

I like the way zsh acts. As we pass the explicit argument to turn on and off, I think raising error is better. Because then the choice is explicit. So, I suggest:

def globlist(args):
    expanded = []
    pattern = set('*[]?!')
    for a in args:
        if pattern & set(a):
            matches = glob(a)
            if not matches:
                raise RuntimeError("no matches found: {0}".format(a))
            expanded.extend(matches)
        else:
            expanded.append(a)
    return expanded

(yea, it's uglier...)

@takluyver
Copy link
Member

I'd be inclined to go with the way bash behaves:

  • It means you can easily pass an argument containing a *, and there are occasional reasons to do that.
  • Far more people are familiar with bash semantics than zsh semantics.
  • It's simpler, and simple code is always better (e.g. your set-matching code doesn't account for escaped characters, like \*).

@bfroehle
Copy link
Contributor

I'm going to agree with @takluyver here, regarding assuming bash-style by default.

If you were going to go with the other format, I'd use the existing glob.has_magic function.

import glob
def glob_args(args):
    # fix the name and docstring
    out = []
    for a in args:
        if glob.has_magic(a):
            matches = glob.glob(a)
            if not matches:
                raise UsageError("No matches found: %s" % a)
            out.extend(matches)
        else:
            out.append(a)
    return out

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2012

Well, you can pass * explicitly and more less easily by just adding \, provided that globlist handles escaping. But I agree that most people are familiar with bash, so I'd change it as @bfroehle suggested.

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2012

Oh, is it in glob module? I was looking for that in fnmatch module!

@tkf
Copy link
Contributor Author

tkf commented Jul 18, 2012

BTW, I was surprised see the definition:

magic_check = re.compile('[*?[]')

def has_magic(s):
    return magic_check.search(s) is not None

Of course,

In [11]:
glob.has_magic(r'\*')
Out [11]:
True

"""
expanded = []
for a in args:
expanded.extend(glob((a) or [a]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Careful with brackets here - you're doing glob((a) or [a]), but it should be glob(a) or [a].

@minrk
Copy link
Member

minrk commented Jul 19, 2012

re: preferences, I agree with @takluyver. I would expect bash-style glob expanding to be the only behavior, without needing a flag.

@tkf
Copy link
Contributor Author

tkf commented Jul 19, 2012

@takluyver Thanks. The commit 1f7c20b3241398338d5c34940c098cfceb827f89 is amended to pretend that I am not stupid :)

@takluyver
Copy link
Member

That looks better. Still to do:

  • On by default seems to be the consensus so far. It was pointed out on the mailing list that Windows shells don't do glob expansion, but I think cross-platform consistency is preferable to disabling it on one platform.
  • It should have a test to ensure it keeps working. TemporaryDirectory will be useful here.

@tkf
Copy link
Contributor Author

tkf commented Jul 19, 2012

Regarding glob escaping. In shells, you can pass a string to program without expanding it by quoting it (e.g., '*'). I guess this is harder to implement than backslash escaping \*. I looked up shlex document but I couldn't find a way to split a string and preserve the quotation:

In [4]:
shlex.split("a 'b' c")
Out [4]:
['a', 'b', 'c']

I guess we should mention the difference from shell glob expansion somewhere in the docstring unless implementing full emulation.

@takluyver
Copy link
Member

Yes, it should go in the docs somewhere. Although I doubt many people
will actually look it up. Even the docstring for %run is pretty long
already.

@tkf
Copy link
Contributor Author

tkf commented Jul 19, 2012

I guess they will look up when they find that their script act in an unexpected way. (BTW, this is why I prefer the zsh way.)

# create files
for p in getpaths(filenames):
open(p, 'w').close()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not necessary to make this a function just to call it once.

for fname in filenames:
    open(os.path.join(td, fname), 'w').close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getpaths is used also in assert_match (please see below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see now that it was used in assert_match. Regardless, it'd be a lot easier just to chdir into the temporary directory.

save = os.getcwdu()
with TemporaryDirectory() as td:
    os.chdir(td)

    # Create empty files.
    for fname in filenames:
        open(fname, 'w').close()

    assert ...

os.chdir(save)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do it safely, you should put it in the context manager or try-finally clause. I'd put chdir in the TemporaryDirectory, if I need to do that. Should I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess you should put it in try / finally block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer context manager because it does not contaminate the namespace of the try block. But anyway, it's done.

@tkf
Copy link
Contributor Author

tkf commented Jul 21, 2012

In the last commit, I made the expansion default because it looks like people prefer this way. Don't worry, if the situation is reversed, I will just remove the commit.

@takluyver
Copy link
Member

globlist and its test probably belong in utils - maybe utils.path.

Other than that, I think this is looking pretty good.

@tkf
Copy link
Contributor Author

tkf commented Jul 23, 2012

Done!

@tkf
Copy link
Contributor Author

tkf commented Aug 2, 2012

I added doctest for %run option parser. I noticed that we need double escape for escaping glob patterns, because shlex strips unused backslashes:

In [2]:
shlex.split(r"\'\*\\", posix=True)
Out [2]:
["'*\\"]

In [3]:
shlex.split(r"\'\*\\", posix=False)
Out [3]:
["\\'\\*\\\\"]

http://docs.python.org/library/shlex.html#parsing-rules

So, I changed the docstring a little bit saying that you need two backslashes to escape glob patterns. But I don't know if you like this.

@takluyver
Copy link
Member

Hmm, it's not ideal to have to double the backslash. Is there any sensible way to avoid that requirement? I'll have a look later as well.

@takluyver
Copy link
Member

Test results for commit 66727cb merged into master
Platform: linux2

  • python2.7: OK (libraries not available: oct2py pymongo wx wx.aui)
  • python3.2: OK (libraries not available: oct2py pymongo wx wx.aui)

Not available for testing: python2.6

@tkf
Copy link
Contributor Author

tkf commented Aug 4, 2012

I found a super hacky way (though it just uses interfaces described in the manual) to get the original string of each token:

In [30]:
def record_returns(original):
    returns = []
    def wrapper(*args, **kwds):
        ret = original(*args, **kwds)
        returns.append(ret)
        return ret
    return (wrapper, returns)

In [34]:
class Proxy(object):
    pass

In [35]:
class MyShlex2(shlex.shlex):

    def __init__(self, *args, **kwds):
        shlex.shlex.__init__(self, *args, **kwds)
        instream = self.instream
        self.instream = Proxy()
        self.instream.readline = instream.readline
        (self.instream.read, self.returns) = record_returns(instream.read)
        self.raw_tokens = []

    def read_token(self):
        ret = shlex.shlex.read_token(self)
        self.raw_tokens.append("".join(self.returns))
        self.returns[:] = []
        return ret

In [36]:
string = r'"a\""'
lex = MyShlex2(string, posix=True)
lex.whitespace_split = True
lex.commenters = ''
list(lex)
Out [36]:
['a"']

In [37]:
lex.raw_tokens
Out [37]:
['"a\\""', '']

So, if lex.raw_tokens[i][0] in ("'", '"') then i-th element returned by list(lex) is quoted. But I guess this is too hacky...

@takluyver
Copy link
Member

Nice going! I'm in two minds about whether behaving like typical shells warrants the extra complexity. Maybe others will chime in.

I think it could also be a little simpler, if instead of record_returns and Proxy we did something like:

class StreamProxy(object):
    def __init__(self, stream):
        self.stream = stream
        self.chunks_read = []

    def read(self, *args, **kwargs):
        ret = self.stream.read(*args, **kwargs)
        self.chunks_read.append(ret)
        return ret

Then the dance in MyShlex2.__init__ becomes self.instream = StreamProxy(self.instream). Note that I haven't tested this, I'm just coding off the top of my head.

@tkf
Copy link
Contributor Author

tkf commented Aug 5, 2012

Yea, I think specific proxy class makes it simpler. I would like to know if this complexity is appropriate here.

@tkf
Copy link
Contributor Author

tkf commented Aug 5, 2012

I forgot to mention that the approach with custom shlex class does not solve the problem with the double backslash. I think it's much difficult to solve this problem.

@takluyver
Copy link
Member

Hmmm, that's annoying. Maybe we should just ignore backslashes and tell people to use quotes to escape wildcards. Otherwise we'll probably end up writing a parser ourselves.

@bfroehle
Copy link
Contributor

The blackslash issue can easily be worked around, we'd just have to write a new arg_split which just sets lex.escape = ''. It's a bit surprising to me that we that this isn't the default in our use anyway. Well, not really, as this would mess up splitting things like "My string with a \" instead of it".

However, I think we need to push the reset button here and first come up with a defined target first and then work towards implementation. In addition we should discuss the current Windows vs. Linux split and whether it is worth maintaining.

As a naive goal, I'd suggest more or less the following mantra: %run [options] filename [args] should function more or less equivalently to $ python [options] filename [args], except that the list of options will be different.

@tkf
Copy link
Contributor Author

tkf commented Aug 14, 2012

Yes, making [args] acts equivalently in %run and python has been my target too.

I'd like to know...

  1. What we do when we cannot implement the shell-equivalent splitting, but can offer a workaround. The double backslash is a good example.
  2. How complex our code can go to make [args] equivalent in %run. I guess implementing our own parser is way too much. But how is hacking stdlib shlex.shlex class?
  3. Whether IPython should behave differently depending on OS (Windows/Linux).

@takluyver
Copy link
Member

On the last question, my vote is to make it work the same way - the POSIX
way - regardless of OS.Many of our users span more than one OS, and I think
serious command line use is much more common on the *nix platforms than
Windows.

@jdmarch
Copy link

jdmarch commented Aug 14, 2012

I agree, modulo flexibility on windows filename paths

@fperez
Copy link
Member

fperez commented Aug 16, 2012

On Tue, Aug 14, 2012 at 9:01 AM, Bradley M. Froehle <
notifications@github.com> wrote:

As a naive goal, I'd suggest more or less the following mantra: %run
[options] filename [args] should function more or less equivalently to $
python [options] filename [args], except that the list of options will be
different.

+1. That has been the intent since the beginning, and to a first
approximation it indeed works that way (though not perfectly, of course, or
we wouldn have this issue)

@tkf
Copy link
Contributor Author

tkf commented Aug 28, 2012

Any comment on my first and second questions?

To repeat, I think there is no way to get rid of double backslash unless we have a shell parser with glob support.

Note that if we use the custom shlex class I suggested, I think user can write command line argument for %run which is compatible with shell, provided that slashes are used only for space and slashes and quotes are used to escape glob patterns. For example, this will yield the same result:

python script.py '*' * words\ with\ spaces\ and\ slashes\\
%run script.py '*' * words\ with\ spaces\ and\ slashes\\

whereas this won't:

python script.py \* *   # \* will not be expanded
%run script.py \* *     # \* will be expanded

Without the custom shlex class, the only way to passing * and expanding other arguments is double backslash:

%run script.py \\* *

I don't have strong opinion on adding the custom shlex class. It improves the situation slightly, but I'm OK without it.

What do you think?

@bfroehle
Copy link
Contributor

bfroehle commented Sep 3, 2012

@tkf Sorry for letting this one go for so long.

Some thoughts:

  1. I don't think there should be an option to disable the glob featuer. I think it should happen always (like in the shell), but perhaps with using quotation marks of some kind to disable it.
  2. I think that the functionality should probably like in parse_options, to be used as:
opts, arg_lst = self.parse_options(parameter_s, 'nidtN:b:pD:l:rs:T:em:', glob=True)

Is the fundamental problem here that shlex.split doesn't do what we want it to do?

@tkf
Copy link
Contributor Author

tkf commented Sep 3, 2012

If we can't support full glob expansion as in shell, I think it's better to have an option to disable it. For example, you can do something like this to ensure that correct list of files (which may contain glob characters such as *) is passed to your script.

def glob_then_quote(pattern):
    return map(repr, glob(pattern))
%run -G script.py --some-option {glob_then_quote('*')}

Or instead, maybe we can even add new "--append-argv" option to make sure correct list is passed to the script:

%run --append-argv=glob('*') script.py --some-option

If we are going to add glob option to parser, it should be in the argparse based one, right? If so, I suggest to open another issue/PR because you will see a big diff which is unrelated to the main issue here.

Is the fundamental problem here that shlex.split doesn't do what we want it to do?

Yes.

@bfroehle
Copy link
Contributor

bfroehle commented Sep 3, 2012

Yes, you do point out a good workaround here:

import glob

def addquotes(filename):
    """Quote a filename."""
    # See pipes.quote
    return "'" + filename.replace("'", "'\"'\"'") + "'"

def myglob(s):
    return ' '.join(map(addquotes, glob.glob(s)))

%run script.py {myglob('*')}

A real shell, like bash, performs the argument splitting and globbing before forking and calling exec. The executed program is responsible for parsing the arguments.

@tkf
Copy link
Contributor Author

tkf commented Sep 27, 2012

@bfroehle glob expanding in {} does not work well when you expanding file names containing *, if there is no option to disable glob expansion in the run magic. So I think we need the option to disable it as this PR.

Any plan for pulling this PR? I think this PR is still worth pulling although it's not perfect. We can use it until we have perfect solution (a package which does glob + shlex). See my previous comments for the state of this PR.

@Carreau
Copy link
Member

Carreau commented Sep 29, 2012

I didn't follow this PR, but there seem to be quite a lot of work here.
Could anyone more involve take a look and decide wether is is worth merging as is and maybe be refined later ?

Sorry to @tkf for having you waiting so long.

@tkf
Copy link
Contributor Author

tkf commented Oct 5, 2012

BTW, I though it was too dumb to mention, but you can have a perfect split + glob expansion if you run system shell every time like this 'for s in {0}; do printf "%s\\0" $s; done'.format(line). Downside of this is you need something different for windows.

@fperez
Copy link
Member

fperez commented Oct 6, 2012

FWIW, I'm +1 on this. I don't want that to be the final call, as I haven't been as close to the review as @bfroehle and @takluyver. But the intent of the PR is definitely good, there's tests and @tkf has done a great job in responding to all review. The actual new code is simple, it's kind of unfortunate that it hits such a delicate behavior, because the ratio of review/discussion to new code in this PR is pretty brutal. But sometimes, that's how it has to be done :)

@bfroehle
Copy link
Contributor

bfroehle commented Oct 7, 2012

Thanks @fperez for the review. I think this is ready to go now too. It's not perfect, but it's certainly an improvement and includes tests.

I'm going to merge now and we can make further refinements in later pull requests if needed.

bfroehle added a commit that referenced this pull request Oct 7, 2012
Expand globs (i.e., '*' and '?') in `%run`.  Use `-G` to skip.
@bfroehle bfroehle merged commit e54a60b into ipython:master Oct 7, 2012
@bfroehle
Copy link
Contributor

bfroehle commented Oct 7, 2012

@tkf Thanks for your patience, persistence, and willingness to make changes to produce a great result in the end.

@tkf
Copy link
Contributor Author

tkf commented Oct 8, 2012

I was afraid I was pining too much :) Anyway, thanks for the merge!

@takluyver
Copy link
Member

Unfortunately this seems to have caused new test failures on Windows - see #2477.

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Expand globs (i.e., '*' and '?') in `%run`.  Use `-G` to skip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants