Unicode bug in Itpl when expanding shell variables in syscalls with ! #822

Closed
fperez opened this Issue Sep 29, 2011 · 22 comments

Projects

None yet

4 participants

@fperez
Member
fperez commented Sep 29, 2011

found this today during a presentation, dumping quickly...

In [35]: files
Out[35]: 
['pics:',
 '83547885.jpg',
 '',
 'ppt:',
 'Aiguille du Midi.pps',
 'Ca\xc3\xb1o Cristales.pps',
 'parejas disparejas.ppt',
 'Underwater.pps',
 '',
 'pub:',
 'image_summary.py',
 'stained_glass_barcelona.png',
 'trapezoid_demo.py',
 'trapezoid.py',
 'trap.py',
 'trap.py~']

In [36]: for f in files:
   ....:     !echo $f
   ....:     
pics:
83547885.jpg

ppt:
Aiguille du Midi.pps
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/home/fperez/tmp/ in ()
      1 for f in files:
----> 2     get_ipython().system(u"echo $f")
      3 

/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.pyc in system_raw(self, cmd)
   1999         # a non-None value would trigger :func:`sys.displayhook` calls.

   2000         # Instead, we store the exit_code in user_ns.

-> 2001         self.user_ns['_exit_code'] = os.system(self.var_expand(cmd, depth=2))
   2002 
   2003     # use piped system by default, because it is better behaved


/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.pyc in var_expand(self, cmd, depth)
   2473                           sys._getframe(depth+1).f_locals # locals
   2474                           )
-> 2475         return py3compat.str_to_unicode(str(res), res.codec)
   2476 
   2477     def mktempfile(self, data=None, prefix='ipython_edit_'):

/home/fperez/usr/lib/python2.6/site-packages/IPython/external/Itpl/_Itpl.pyc in __str__(self)
    240     def __str__(self):
    241         """Evaluate and substitute the appropriate parts of the string."""
--> 242         return self._str(self.globals,self.locals)
    243 
    244     def __repr__(self):

/home/fperez/usr/lib/python2.6/site-packages/IPython/external/Itpl/_Itpl.pyc in _str(self, glob, loc)
    197             if live: app(str(eval(chunk,glob,loc)))
    198             else: app(chunk)
--> 199         out = ''.join(result)
    200         try:
    201             return str(out)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
@minrk
Member
minrk commented Sep 29, 2011

Isn't it well established that Itpl just doesn't support unicode in the slightest?

@fperez
Member
fperez commented Sep 30, 2011

Right... I had people in front of me so it was really quick and I didn't have the bandwidth to recall that.

One more reason to push on moving out of Itpl, then. Unicode in the filesystem is more and more common, so this will be more annoying to people as time goes on. I don't know if we'll manage to transition it by 0.12 though.

@takluyver
Member

Just to mention, the FullEvalFormatter that I'd like to use for this is in PR #507 (prompt manager). So I'd prefer to get that merged in before approaching this. But if the prompt manager is going to be delayed until after 0.12, I can copy FullEvalFormatter across and do this first.

@takluyver
Member

@fperez: Is the prompt manager stuff likely to get in for 0.12? If not, I can copy across the FullEvalFormatter code and replace Itpl here with string formatting.

@fperez
Member
fperez commented Oct 16, 2011

Let's see if we can find the time to resolve those. If we can make headway into the harder PRs we have before release, we won't need to. I'd like to tackle them one at a time over the next week, we'll see how it goes.

@minrk
Member
minrk commented Oct 18, 2011

Itpl is actually quite small, and it was easy to get it working, at least for this particular bug. If we do want to keep it, should I keep it str-only, or make it unicode-native, or just leave it like Thomas says, and move to EvalFormatter (losing '$' in the process).

EvalFormatter is definitely much better (and more pythonic) code. But I have a feeling that the people who use the '$' expansion will be very sad to see it gone. Especially if we restore the long lost shell profile.

@fperez
Member
fperez commented Oct 18, 2011

I'd forgotten that itpl is the one that gives us $ expansion, and that is something that is definitely very useful, and that I've demoed multiple times to audiences in addition to using it personally quite regularly.

So that's a good argument for keeping Itpl around then, even if we move to EvalFormatter for most of our internal use. In that case, making it unicode compliant seems like the right way to go, as that will ensure things work OK in py3.

@takluyver
Member

Making $name and $name.attr work should just be a few lines subclass of FullEvalFormatter, so if it allows us to drop ~300 lines of Itpl (which it seems we now need to maintain ourselves), I think it's worth doing.

More complex expressions will need to be written as ${name['item'](args)}, but I don't think that's a show stopper.

@fperez
Member
fperez commented Oct 19, 2011

On Wed, Oct 19, 2011 at 5:10 AM, Thomas
reply@reply.github.com
wrote:

Making $name and $name.attr work should just be a few lines subclass of FullEvalFormatter, so if it allows us to drop ~300 lines of Itpl (which it seems we now need to maintain ourselves), I think it's worth doing.

More complex expressions will need to be written as ${name['item'](args)}, but I don't think that's a show stopper.

+1. I think $n and $n.a are the two main use cases for this, so I'm all for it.

@takluyver
Member

Great - I'll get onto this either when the prompt manager stuff gets merged,
or if we decide that's getting deferred until 0.13.

@fperez
Member
fperez commented Oct 19, 2011

On Wed, Oct 19, 2011 at 9:15 AM, Thomas
reply@reply.github.com
wrote:

Great - I'll get onto this either when the prompt manager stuff gets merged,
or if we decide that's getting deferred until 0.13.

OK. I've been trying to get to the 'meaty' PRs, but the flow of
lightweight ones just won't stop :) Soon we're going to have to
perhaps switch strategies and tackle the meaty ones even if it means
allowing quick ones to pile up, else we'll never clear those which
require more thought and effort, and that I want to have in before
they bit-rot.

@minrk
Member
minrk commented Oct 19, 2011

Sounds good - I'll slow down on the small ones, I was just trying to clean out some of the easy 0.12 issues.

Thomas, how were you thinking adding '$foo' support would work in FullEval? All the actual parsing in handled by the str._formatter_parser method, so it seems like you would essentially have to rewrite the Itpl parse all over again.

@minrk
Member
minrk commented Oct 19, 2011

I should also note that when I was digging into this ( I do already have unicode itpl working ), I discovered a small related bug - os.system doesn't like unicode, so we have to make sure that we encode with unicode_to_str when we pass to it.

@fperez
Member
fperez commented Oct 19, 2011

On Wed, Oct 19, 2011 at 10:00 AM, Min RK
reply@reply.github.com
wrote:

Sounds good - I'll slow down on the small ones, I was just trying to clean out some of the easy 0.12 issues.

I wasn't trying to sound critical, BTW. It's awesome that you
flushed so many of those, and that's why I made an effort to keep up
on the receiving end. So thanks a ton for all that work: in the
aggregate it's actually a lot of significant improvements. But I do
worry about some of the harder ones at the bottom of the PR list. The
longer they linger, the higher the chances of merge conflicts that may
be non-trivial to resolve, and also that we simply forget the context
and details of the discussion (happens to me at least).

@takluyver
Member

On 19 October 2011 18:00, Min RK <
reply@reply.github.com>wrote:

Thomas, how were you thinking adding '$foo' support would work in FullEval?
All the actual parsing in handled by the str._formatter_parser method, so
it seems like you would essentially have to rewrite the Itpl parse all over
again.

Itpl does character by character processing, so you can have complex
expressions with just a $ prefix. But I think, as Fernando says, most of the
usage is the simple cases - $name and $name.attr. If it's OK to require {}
braces around any more complex expressions, then I think we can get by with
a simple regex, and improve performance while we're at it.

string.Formatter has an overridable .parse method, where this logic can go.

@minrk
Member
minrk commented Oct 19, 2011

okay, makes sense.

@fperez
Member
fperez commented Oct 19, 2011

On Wed, Oct 19, 2011 at 10:56 AM, Thomas
reply@reply.github.com
wrote:

If it's OK to require {}
braces around any more complex expressions, then I think we can get by with
a simple regex, and improve performance while we're at it.

It's OK also b/c I think it improves readability. For a complex
expression, the braces help the reader clearly understand what the
expression is, without having to guess the behavior of the underlying
parser (and/or its bugs). So $x, $x.a, ${complicated expr}, seems
like a perfect design brief moving forward (even if it's not precisely
what itpl did historically).

@minrk
Member
minrk commented Oct 21, 2011

So am I correct in understanding that the official plan for this is for Thomas to add $foo support to FullEvalFormatter, and remove Itpl as part of the PromptManager PR?

@fperez
Member
fperez commented Oct 22, 2011

On Fri, Oct 21, 2011 at 2:23 PM, Min RK
reply@reply.github.com
wrote:

So am I correct in understanding that the official plan for this is for Thomas to add $foo support to FullEvalFormatter, and remove Itpl as part of the PromptManager PR?

At least moving over to FullEvalFormatter. I'd hold off on removing
Itpl altogether for a release or two, just in case people use it
through us. But we can stop using it and slap a deprecation warning
on it so anyone importing it from us gets a heads-up.

@fperez fperez closed this in 09c9952 Nov 20, 2011
@fperez
Member
fperez commented Nov 20, 2011

@takluyver, let me know if the test I added in 31ab23f causes any issues in py3. Thanks for the PR!

@stefanv
Contributor
stefanv commented Nov 21, 2011

Should a person have access to environment variables? E.g., I can't do

!echo ${HOME}
@minrk
Member
minrk commented Nov 21, 2011

@stefanv Yes, they should, though the point of this is to allow $HOME to get HOME from the IPython environment. I think the way it used to work was that you would use $$HOME to pass the string $HOME to the system call, though that does not work with this change.

@stefanv stefanv pushed a commit to stefanv/ipython that referenced this issue Nov 30, 2011
@takluyver takluyver Use DollarFormatter to fill in names in ! shell calls.
Closes gh-822
f5687fc
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@takluyver takluyver Use DollarFormatter to fill in names in ! shell calls.
Closes gh-822
8d6fcbd
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@fperez fperez Add failing test for #822, will be fixed next 0435d4f
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@fperez fperez String in test for #822 was meant to be a bytestring, fixes it. bc7c048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment