Unicode bug in Itpl when expanding shell variables in syscalls with ! #822

Closed
fperez opened this Issue Sep 29, 2011 · 22 comments

Projects

None yet

4 participants

@fperez
IPython member

found this today during a presentation, dumping quickly...

In [35]: files
Out[35]: 
['pics:',
 '83547885.jpg',
 '',
 'ppt:',
 'Aiguille du Midi.pps',
 'Ca\xc3\xb1o Cristales.pps',
 'parejas disparejas.ppt',
 'Underwater.pps',
 '',
 'pub:',
 'image_summary.py',
 'stained_glass_barcelona.png',
 'trapezoid_demo.py',
 'trapezoid.py',
 'trap.py',
 'trap.py~']

In [36]: for f in files:
   ....:     !echo $f
   ....:     
pics:
83547885.jpg

ppt:
Aiguille du Midi.pps
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/home/fperez/tmp/ in ()
      1 for f in files:
----> 2     get_ipython().system(u"echo $f")
      3 

/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.pyc in system_raw(self, cmd)
   1999         # a non-None value would trigger :func:`sys.displayhook` calls.

   2000         # Instead, we store the exit_code in user_ns.

-> 2001         self.user_ns['_exit_code'] = os.system(self.var_expand(cmd, depth=2))
   2002 
   2003     # use piped system by default, because it is better behaved


/home/fperez/usr/lib/python2.6/site-packages/IPython/core/interactiveshell.pyc in var_expand(self, cmd, depth)
   2473                           sys._getframe(depth+1).f_locals # locals
   2474                           )
-> 2475         return py3compat.str_to_unicode(str(res), res.codec)
   2476 
   2477     def mktempfile(self, data=None, prefix='ipython_edit_'):

/home/fperez/usr/lib/python2.6/site-packages/IPython/external/Itpl/_Itpl.pyc in __str__(self)
    240     def __str__(self):
    241         """Evaluate and substitute the appropriate parts of the string."""
--> 242         return self._str(self.globals,self.locals)
    243 
    244     def __repr__(self):

/home/fperez/usr/lib/python2.6/site-packages/IPython/external/Itpl/_Itpl.pyc in _str(self, glob, loc)
    197             if live: app(str(eval(chunk,glob,loc)))
    198             else: app(chunk)
--> 199         out = ''.join(result)
    200         try:
    201             return str(out)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
@minrk
IPython member

Isn't it well established that Itpl just doesn't support unicode in the slightest?

@fperez
IPython member

Right... I had people in front of me so it was really quick and I didn't have the bandwidth to recall that.

One more reason to push on moving out of Itpl, then. Unicode in the filesystem is more and more common, so this will be more annoying to people as time goes on. I don't know if we'll manage to transition it by 0.12 though.

@takluyver
IPython member

Just to mention, the FullEvalFormatter that I'd like to use for this is in PR #507 (prompt manager). So I'd prefer to get that merged in before approaching this. But if the prompt manager is going to be delayed until after 0.12, I can copy FullEvalFormatter across and do this first.

@takluyver
IPython member

@fperez: Is the prompt manager stuff likely to get in for 0.12? If not, I can copy across the FullEvalFormatter code and replace Itpl here with string formatting.

@fperez
IPython member

Let's see if we can find the time to resolve those. If we can make headway into the harder PRs we have before release, we won't need to. I'd like to tackle them one at a time over the next week, we'll see how it goes.

@minrk
IPython member

Itpl is actually quite small, and it was easy to get it working, at least for this particular bug. If we do want to keep it, should I keep it str-only, or make it unicode-native, or just leave it like Thomas says, and move to EvalFormatter (losing '$' in the process).

EvalFormatter is definitely much better (and more pythonic) code. But I have a feeling that the people who use the '$' expansion will be very sad to see it gone. Especially if we restore the long lost shell profile.

@fperez
IPython member

I'd forgotten that itpl is the one that gives us $ expansion, and that is something that is definitely very useful, and that I've demoed multiple times to audiences in addition to using it personally quite regularly.

So that's a good argument for keeping Itpl around then, even if we move to EvalFormatter for most of our internal use. In that case, making it unicode compliant seems like the right way to go, as that will ensure things work OK in py3.

@takluyver
IPython member

Making $name and $name.attr work should just be a few lines subclass of FullEvalFormatter, so if it allows us to drop ~300 lines of Itpl (which it seems we now need to maintain ourselves), I think it's worth doing.

More complex expressions will need to be written as ${name['item'](args)}, but I don't think that's a show stopper.

@fperez
IPython member
@takluyver
IPython member
@fperez
IPython member
@minrk
IPython member

Sounds good - I'll slow down on the small ones, I was just trying to clean out some of the easy 0.12 issues.

Thomas, how were you thinking adding '$foo' support would work in FullEval? All the actual parsing in handled by the str._formatter_parser method, so it seems like you would essentially have to rewrite the Itpl parse all over again.

@minrk
IPython member

I should also note that when I was digging into this ( I do already have unicode itpl working ), I discovered a small related bug - os.system doesn't like unicode, so we have to make sure that we encode with unicode_to_str when we pass to it.

@fperez
IPython member
@takluyver
IPython member
@minrk
IPython member

okay, makes sense.

@fperez
IPython member
@minrk
IPython member

So am I correct in understanding that the official plan for this is for Thomas to add $foo support to FullEvalFormatter, and remove Itpl as part of the PromptManager PR?

@fperez
IPython member
@fperez fperez closed this in 09c9952 Nov 20, 2011
@fperez
IPython member

@takluyver, let me know if the test I added in 31ab23f causes any issues in py3. Thanks for the PR!

@stefanv

Should a person have access to environment variables? E.g., I can't do

!echo ${HOME}
@minrk
IPython member

@stefanv Yes, they should, though the point of this is to allow $HOME to get HOME from the IPython environment. I think the way it used to work was that you would use $$HOME to pass the string $HOME to the system call, though that does not work with this change.

@stefanv stefanv pushed a commit to stefanv/ipython that referenced this issue Nov 30, 2011
@takluyver takluyver Use DollarFormatter to fill in names in ! shell calls.
Closes gh-822
f5687fc
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@takluyver takluyver Use DollarFormatter to fill in names in ! shell calls.
Closes gh-822
8d6fcbd
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@fperez fperez Add failing test for #822, will be fixed next 0435d4f
@mattvonrocketstein mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
@fperez fperez String in test for #822 was meant to be a bytestring, fixes it. bc7c048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment