Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow traceback construction from Cython extension #1317

Closed
wesm opened this issue Jan 24, 2012 · 18 comments · Fixed by #1341
Closed

Very slow traceback construction from Cython extension #1317

wesm opened this issue Jan 24, 2012 · 18 comments · Fixed by #1341
Assignees
Labels
Milestone

Comments

@wesm
Copy link

wesm commented Jan 24, 2012

I think this is the fault of IPython and I'm having a hard time coming up with a self-contained example, but basically I'm noticing a major lag between statement execution and IPython coming back with the formatted traceback when the exception is generated in Cython code. For example, with pandas git master (99e2eec, because I'm about to fix this bug), the following code takes about 8.5 seconds to play out:

from pandas import Series
s = Series({'T1': 5})
print s.groupby({'T1': 'T2'}).agg(sum)

Running this code

import time
start = time.time()
print s.groupby({'T1': 'T2'}).agg(sum)
time.time() - start

gives me

In [10]: time.time() - start
Out[10]: 8.473900079727173

By contrast, running the same code in the standard Python interpreter gives the stack trace virtually instantaneously.

Any ideas what's wrong? This same problem seems to be present in IPython git master, 0.12, and 0.11.

@takluyver
Copy link
Member

Installing pandas in my virtualenv seems to mess up my sys.path so that I get thrown back to IPython 0.11 (which I don't normally get with or without the virtualenv).

There's quite a lot of code to produce those tracebacks. If I switch to plain exception mode (using %xmode) they appear instantly, so it must be something going wrong in ultratb.

@takluyver
Copy link
Member

Installing pandas in my virtualenv adds these two lines to $VIRTUAL_ENV/lib/python2.7/site-packages/easy-install.pth, throwing the order of things on sys.path off:

/usr/lib/pymodules/python2.7
/usr/lib/python2.7/dist-packages

Do you want me to file a bug for this?

@minrk
Copy link
Member

minrk commented Jan 24, 2012

What Cython/Python/etc.? I run this code with 99e2eec and get TypeError: unhashable type 'dict' immediately.

OSX 10.7 System Python 2.7.1
cython/cython@3d13ae3
wesm/pandas@99e2eec

@wesm
Copy link
Author

wesm commented Jan 24, 2012

@takluyver is that pandas-related? Must be something weird going wrong in setuptools.

@minrk I'll try a newer Python. This is Python 2.7.2 on 64-bit Ubuntu

@takluyver
Copy link
Member

@wesm : I suspected ez_setup, but commenting that out from setup.py didn't seem to make a difference. I wonder if it's also related to Cython.

I replicated the slow traceback with Python 2.7.2 on 32-bit Ubuntu.

@wesm
Copy link
Author

wesm commented Jan 24, 2012

Just updated to Cython git master and seems be OK now. Unclear why the behavior was only manifesting in the ultratb. Closing the issue (unless you guys want to investigate further)

@wesm wesm closed this as completed Jan 24, 2012
@minrk
Copy link
Member

minrk commented Jan 25, 2012

Thanks - just for reference, what Cython version were you using before you updated? Could be that they had some actions that are triggered by inspection for better tracebacks, but require some work.

@wesm
Copy link
Author

wesm commented Jan 25, 2012

Appears to be a problem with 262bc9f and others around that time frame. Could do a git bisect to see when it got better

@wesm
Copy link
Author

wesm commented Jan 25, 2012

git bisect yielded:

0e579823bd34de5d1d9b4aeac2c8d727415cba2d is the first bad commit
commit 0e579823bd34de5d1d9b4aeac2c8d727415cba2d
Author: Mark Florisson <markflorisson88@gmail.com>
Date:   Sat Dec 24 23:01:25 2011 +0100

    Lower buffer_max_dims to 8

:040000 040000 5cef283c2e0a364154c99aa9876e118df137798f 4cb82498c330d2c45876bae61fc39447d0ff9941 M  Cython

I just verified that the bug is present in Cython 4b9a1ff and not in 0e57982. Maybe we should raise the issue with them so that they know it was a problem (in case it was fixed by accident)? It's strange because the change in 0e57982 is so tiny.

@wesm wesm reopened this Jan 29, 2012
@wesm
Copy link
Author

wesm commented Jan 29, 2012

So I scripted IPython to get a profiling of what's going on and it's extremely perplexing:

11:58 ~/Dropbox/book/svn $ python -m cProfile -s cumulative foo.py 
         8265151 function calls (8261576 primitive calls) in 12.425 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   12.427   12.427 foo.py:1(<module>)
        3    0.000    0.000   11.816    3.939 shell.py:118(process_input_line)
        3    0.000    0.000   11.815    3.938 interactiveshell.py:2358(run_cell)
        3    0.000    0.000   11.813    3.938 interactiveshell.py:2447(run_ast_nodes)
        3    0.000    0.000   11.813    3.938 interactiveshell.py:2511(run_code)
        1    0.000    0.000   11.781   11.781 interactiveshell.py:1594(showtraceback)
        1    0.000    0.000   11.781   11.781 ultratb.py:1153(structured_traceback)
        1    0.001    0.001   11.781   11.781 ultratb.py:1061(structured_traceback)
        1    2.146    2.146   11.780   11.780 ultratb.py:674(structured_traceback)
  3097026    3.163    0.000    9.454    0.000 tokenize.py:264(generate_tokens)
  3097286    6.095    0.000    6.095    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
        1    0.000    0.000    0.400    0.400 shell.py:70(__init__)
        2    0.000    0.000    0.397    0.199 configurable.py:287(instance)
        1    0.000    0.000    0.396    0.396 interactiveshell.py:399(__init__)
        1    0.000    0.000    0.364    0.364 interactiveshell.py:1436(init_history)
        1    0.000    0.000    0.364    0.364 history.py:396(__init__)
        1    0.000    0.000    0.297    0.297 history.py:93(__init__)
        5    0.297    0.059    0.297    0.059 {method 'execute' of 'sqlite3.Connection' objects}
        1    0.000    0.000    0.297    0.297 history.py:151(init_db)
        1    0.003    0.003    0.211    0.211 shell.py:4(<module>)
        1    0.002    0.002    0.113    0.113 __init__.py:99(<module>)
   703160    0.112    0.000    0.112    0.000 {method 'span' of '_sre.SRE_Match' objects}
        1    0.001    0.001    0.103    0.103 rcsetup.py:14(<module>)
        3    0.001    0.000    0.094    0.031 __init__.py:6(<module>)

hmm, the plot thickens. so something appears to be going very wrong inside generate_tokens. anyone with clues? I feel like we ought to figure this one out

@wesm
Copy link
Author

wesm commented Jan 29, 2012

More: generate_tokens yields about 2.6 million tokens before raising an exception


12:18 ~/Dropbox/book/svn $ python -m pdb foo.py 
> /home/wesm/Dropbox/book/svn/foo.py(1)<module>()
-> from ipybook.shell import IPythonShell
(Pdb) b IPython/core/ultratb:838, 'tseries' in file
Breakpoint 1 at /home/wesm/code/ipython/IPython/core/ultratb.py:838
(Pdb) 
Breakpoint 2 at /home/wesm/code/ipython/IPython/core/ultratb.py:838
(Pdb) c
> /home/wesm/code/ipython/IPython/core/ultratb.py(838)structured_traceback()
-> name_cont = False
gen = generate_tokens(linereader)
gen.next()
(52, '\x00', (1, 0), (1, 1), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 1), (1, 2), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 2), (1, 3), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 3), (1, 4), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
gen.next()
(52, '\x00', (1, 4), (1, 5), '\x00\x00\x00\x00\x00 \t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00(\t3\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n')
results = []
while True: results.append(gen.next())
*** TokenError: ('EOF in multi-line statement', (38697, 0))
len(results)
2580658

@wesm
Copy link
Author

wesm commented Jan 29, 2012

OK so I think this is a bug in IPython. No attempt to tokenize a binary .so or .pyd file should ever be made. After updating Cython, by some dumb luck, this happens on the 48th iteration of the generate_tokens generator


> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(841)structured_traceback()
-> print >> sys.stderr, file, lnum

/home/wesm/code/pandas/pandas/_tseries.so 27
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(842)structured_traceback()
-> print >> sys.stderr, i
n
47
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(844)structured_traceback()
-> if token_type == tokenize.NAME and token not in keyword.kwlist:
l
839     
840                     for i, (token_type, token, start, end, line) in enumerate(generate_tokens(linereader)):
841                         print >> sys.stderr, file, lnum
842                         print >> sys.stderr, i
843                         # build composite names
844  ->                     if token_type == tokenize.NAME and token not in keyword.kwlist:
845                             if name_cont:
846                                 # Continuation of a dotted name
847                                 try:
848                                     names[-1].append(token)
849                                 except IndexError:
token_type
4
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(860)structured_traceback()
-> elif token == '.':
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(862)structured_traceback()
-> elif token_type == tokenize.NEWLINE:
l
857                                 # to do (or the caller can print a list with repeated
858                                 # names if so desired.
859                                 names.append([token])
860                         elif token == '.':
861                             name_cont = True
862  ->                     elif token_type == tokenize.NEWLINE:
863                             break
864     
865                 except (IndexError, UnicodeDecodeError):
866                     # signals exit of tokenizer
867                     pass
n
> /home/wesm/code/repos/ipython/IPython/core/ultratb.py(863)structured_traceback()
-> break
l
858                                 # names if so desired.
859                                 names.append([token])
860                         elif token == '.':
861                             name_cont = True
862                         elif token_type == tokenize.NEWLINE:
863  ->                         break
864     
865                 except (IndexError, UnicodeDecodeError):
866                     # signals exit of tokenizer
867                     pass
868                 except tokenize.TokenError,msg:
token_type
4
token
'\n'
start
(1, 47)
end
(1, 48)
line
'\x00\x00\x00\x00\x00`\x083\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\xc2\x03\x12\x00\x00\x00\x00\x00h\x083\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00P\xa7\n'

any proposed solutions?

@takluyver
Copy link
Member

We call tokenize when generating the tracebacks, both for the coloured
source, and for the variable names on the relevant line for the 'context'
traceback mode.

I'd guess it's trying to parse the compiled file, rather than the source,
and falling down. But it doesn't raise an exception until it's done a load
of parsing. I had a similar problem before, but it blew up instantly. We
probably should figure out by some means that it's looking at a compiled
file, and either look up the pyx file, or skip it altogether (I don't know
if tokenize will work on Cython code, and of course it may not be present).

Relevant code is around here:
https://github.com/ipython/ipython/blob/master/IPython/core/ultratb.py#L827

@takluyver
Copy link
Member

Assigning this to myself, since I've touched ultraTB most recently.

@ghost ghost assigned takluyver Jan 29, 2012
@wesm
Copy link
Author

wesm commented Jan 29, 2012

Probably won't work with Cython code, because it's part of the Python standard library. Any binary file should probably just be skipped

@takluyver
Copy link
Member

Are there other valid extensions for binary files with Python functions, beyond .so and .pyd? Is .dll possible on Windows?

@takluyver
Copy link
Member

See PR #1341

fperez added a commit that referenced this issue Jan 29, 2012
Don't attempt to tokenize binary files for tracebacks.

Previously we had been trying and just catching the exception, but in corner cases the tokenizer can run for several seconds before raising an exception (#1317). This skips tokenizing if the file has the extension .so, .pyd or .dll.

Closes gh-1317.
@wesm
Copy link
Author

wesm commented Jan 29, 2012

Much obliged and thanks as always.

yarikoptic pushed a commit to yarikoptic/ipython that referenced this issue May 2, 2014
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
Don't attempt to tokenize binary files for tracebacks.

Previously we had been trying and just catching the exception, but in corner cases the tokenizer can run for several seconds before raising an exception (ipython#1317). This skips tokenizing if the file has the extension .so, .pyd or .dll.

Closes ipythongh-1317.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants