Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Very slow traceback construction from Cython extension #1317

Closed
wesm opened this Issue · 18 comments

3 participants

@wesm

I think this is the fault of IPython and I'm having a hard time coming up with a self-contained example, but basically I'm noticing a major lag between statement execution and IPython coming back with the formatted traceback when the exception is generated in Cython code. For example, with pandas git master (99e2eec, because I'm about to fix this bug), the following code takes about 8.5 seconds to play out:

from pandas import Series
s = Series({'T1': 5})
print s.groupby({'T1': 'T2'}).agg(sum)

Running this code

import time
start = time.time()
print s.groupby({'T1': 'T2'}).agg(sum)
time.time() - start

gives me

In [10]: time.time() - start
Out[10]: 8.473900079727173

By contrast, running the same code in the standard Python interpreter gives the stack trace virtually instantaneously.

Any ideas what's wrong? This same problem seems to be present in IPython git master, 0.12, and 0.11.

@takluyver
Owner

Installing pandas in my virtualenv seems to mess up my sys.path so that I get thrown back to IPython 0.11 (which I don't normally get with or without the virtualenv).

There's quite a lot of code to produce those tracebacks. If I switch to plain exception mode (using %xmode) they appear instantly, so it must be something going wrong in ultratb.

@takluyver
Owner

Installing pandas in my virtualenv adds these two lines to $VIRTUAL_ENV/lib/python2.7/site-packages/easy-install.pth, throwing the order of things on sys.path off:

/usr/lib/pymodules/python2.7
/usr/lib/python2.7/dist-packages

Do you want me to file a bug for this?

@minrk
Owner

What Cython/Python/etc.? I run this code with 99e2eec and get TypeError: unhashable type 'dict' immediately.

OSX 10.7 System Python 2.7.1
cython/cython@3d13ae3
wesm/pandas@99e2eec

@wesm

@takluyver is that pandas-related? Must be something weird going wrong in setuptools.

@minrk I'll try a newer Python. This is Python 2.7.2 on 64-bit Ubuntu

@takluyver
Owner

@wesm : I suspected ez_setup, but commenting that out from setup.py didn't seem to make a difference. I wonder if it's also related to Cython.

I replicated the slow traceback with Python 2.7.2 on 32-bit Ubuntu.

@wesm

Just updated to Cython git master and seems be OK now. Unclear why the behavior was only manifesting in the ultratb. Closing the issue (unless you guys want to investigate further)

@wesm wesm closed this
@minrk
Owner

Thanks - just for reference, what Cython version were you using before you updated? Could be that they had some actions that are triggered by inspection for better tracebacks, but require some work.

@wesm

Appears to be a problem with 262bc9f and others around that time frame. Could do a git bisect to see when it got better

@wesm

git bisect yielded:

0e579823bd34de5d1d9b4aeac2c8d727415cba2d is the first bad commit
commit 0e579823bd34de5d1d9b4aeac2c8d727415cba2d
Author: Mark Florisson <markflorisson88@gmail.com>
Date:   Sat Dec 24 23:01:25 2011 +0100

    Lower buffer_max_dims to 8

:040000 040000 5cef283c2e0a364154c99aa9876e118df137798f 4cb82498c330d2c45876bae61fc39447d0ff9941 M  Cython

I just verified that the bug is present in Cython 4b9a1ff and not in 0e57982. Maybe we should raise the issue with them so that they know it was a problem (in case it was fixed by accident)? It's strange because the change in 0e57982 is so tiny.

@wesm wesm reopened this
@wesm

So I scripted IPython to get a profiling of what's going on and it's extremely perplexing:

11:58 ~/Dropbox/book/svn $ python -m cProfile -s cumulative foo.py 
         8265151 function calls (8261576 primitive calls) in 12.425 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   12.427   12.427 foo.py:1(<module>)
        3    0.000    0.000   11.816    3.939 shell.py:118(process_input_line)
        3    0.000    0.000   11.815    3.938 interactiveshell.py:2358(run_cell)
        3    0.000    0.000   11.813    3.938 interactiveshell.py:2447(run_ast_nodes)
        3    0.000    0.000   11.813    3.938 interactiveshell.py:2511(run_code)
        1    0.000    0.000   11.781   11.781 interactiveshell.py:1594(showtraceback)
        1    0.000    0.000   11.781   11.781 ultratb.py:1153(structured_traceback)
        1    0.001    0.001   11.781   11.781 ultratb.py:1061(structured_traceback)
        1    2.146    2.146   11.780   11.780 ultratb.py:674(structured_traceback)
  3097026    3.163    0.000    9.454    0.000 tokenize.py:264(generate_tokens)
  3097286    6.095    0.000    6.095    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.400    0.400 shell.py:70(__init__)
        2    0.000    0.000    0.397    0.199 configurable.py:287(instance)
        1    0.000    0.000    0.396    0.396 interactiveshell.py:399(__init__)
        1    0.000    0.000    0.364    0.364 interactiveshell.py:1436(init_history)
        1    0.000    0.000    0.364    0.364 history.py:396(__init__)
        1    0.000    0.000    0.297    0.297 history.py:93(__init__)
        5    0.297    0.059    0.297    0.059 {method 'execute' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.297    0.297 history.py:151(init_db)
        1    0.003    0.003    0.211    0.211 shell.py:4(<module>)
        1    0.002    0.002    0.113    0.113 __init__.py:99(<module>)
   703160    0.112    0.000    0.112    0.000 {method 'span' of '_sre.SRE_Match' objects}
        1    0.001    0.001    0.103    0.103 rcsetup.py:14(<module>)
        3    0.001    0.000    0.094    0.031 __init__.py:6(<module>)

hmm, the plot thickens. so something appears to be going very wrong inside generate_tokens. anyone with clues? I feel like we ought to figure this one out

@wesm

More: generate_tokens yields about 2.6 million tokens before raising an exception


12:18 ~/Dropbox/book/svn $ python -m pdb foo.py 
> /home/wesm/Dropbox/book/svn/foo.py(1)<module>()
-> from ipybook.shell import IPythonShell
(Pdb) b IPython/core/ultratb:838, 'tseries' in file
Breakpoint 1 at /home/wesm/code/ipython/IPython/core/ultratb.py:838
(Pdb) 
Breakpoint 2 at /home/wesm/code/ipython/IPython/core/ultratb.py:838
(Pdb) c
> /home/wesm/code/ipython/IPython/core/ultratb.py(838)structured_traceback()
-> name_cont = False
gen = generate_tokens(linereader)
gen.next()
(52, '\x00', (1, 0), (1, 1), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 1), (1, 2), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 2), (1, 3), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 3), (1, 4), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 4), (1, 5), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
results = []
while True: results.append(gen.next())
*** TokenError: ('EOF in multi-line statement', (38697, 0))
len(results)
2580658
@wesm

OK so I think this is a bug in IPython. No attempt to tokenize a binary .so or .pyd file should ever be made. After updating Cython, by some dumb luck, this happens on the 48th iteration of the generate_tokens generator


> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(841)structured_traceback()
-> print >> sys.stderr, file, lnum

/home/wesm/code/pandas/pandas/_tseries.so 27
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(842)structured_traceback()
-> print >> sys.stderr, i
n
47
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(844)structured_traceback()
-> if token_type == tokenize.NAME and token not in keyword.kwlist:
l
839     
840                     for i, (token_type, token, start, end, line) in enumerate(generate_tokens(linereader)):
841                         print >> sys.stderr, file, lnum
842                         print >> sys.stderr, i
843                         # build composite names
844  ->                     if token_type == tokenize.NAME and token not in keyword.kwlist:
845                             if name_cont:
846                                 # Continuation of a dotted name
847                                 try:
848                                     names[-1].append(token)
849                                 except IndexError:
token_type
4
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(860)structured_traceback()
-> elif token == '.':
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(862)structured_traceback()
-> elif token_type == tokenize.NEWLINE:
l
857                                 # to do (or the caller can print a list with repeated
858                                 # names if so desired.
859                                 names.append([token])
860                         elif token == '.':
861                             name_cont = True
862  ->                     elif token_type == tokenize.NEWLINE:
863                             break
864     
865                 except (IndexError, UnicodeDecodeError):
866                     # signals exit of tokenizer
867                     pass
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(863)structured_traceback()
-> break
l
858                                 # names if so desired.
859                                 names.append([token])
860                         elif token == '.':
861                             name_cont = True
862                         elif token_type == tokenize.NEWLINE:
863  ->                         break
864     
865                 except (IndexError, UnicodeDecodeError):
866                     # signals exit of tokenizer
867                     pass
868                 except tokenize.TokenError,msg:
token_type
4
token
'\n'
start
(1, 47)
end
(1, 48)
line
'\x00\x00\x00\x00\x00`\x083\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00h\x083\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n'

any proposed solutions?

@takluyver
Owner
@takluyver
Owner

Assigning this to myself, since I've touched ultraTB most recently.

@takluyver takluyver was assigned
@wesm

Probably won't work with Cython code, because it's part of the Python standard library. Any binary file should probably just be skipped

@takluyver
Owner

Are there other valid extensions for binary files with Python functions, beyond .so and .pyd? Is .dll possible on Windows?

@takluyver
Owner

See PR #1341

@takluyver takluyver closed this in 9fbeb53
@wesm

Much obliged and thanks as always.

@yarikoptic yarikoptic referenced this issue from a commit in yarikoptic/ipython
@takluyver takluyver Don't attempt to tokenize binary files for tracebacks.
Closes gh-1317
df9f02f
@mattvonrocketstein mattvonrocketstein referenced this issue from a commit in mattvonrocketstein/ipython
@takluyver takluyver Don't attempt to tokenize binary files for tracebacks.
Closes gh-1317
cb971dc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.