-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
traceback: add a new thin class storing a traceback without storing local variables #62111
Comments
For Tulip I may have to extract tracebacks and store the extracted data, in order to avoid cycles created by the frames referenced in the traceback. I'll be extracting the traceback every time an exception is raised, and in most cases I won't be using the extracted info, because usually the exception is caught (or logged) by user code. But it's very important to ensure that if the user code doesn't catch or log it, I can log a traceback, and I won't know that this is the case until a destructor is called, which may be quite a bit later. (Reference: http://code.google.com/p/tulip/source/browse/tulip/futures.py#38) Unfortunately it looks like the public APIs do a lot more work than needed. Ideally, there'd be something similar to _extract_tb_or_stack_iter() that doesn't call linecache.getline() -- it should just return triples of (filename, lineno, functionname), and enough structure to tell apart the __cause__, __context__, and __traceback__ (the first two possibly repeated). Given this info it would be simple enough to format and log the actual traceback, and storing just this would take less space and time than computing the lines of the fully-formatted traceback. |
I think instead we may want to add a finalize() or close() method on frame objects which would clear all local variables (as well as dereference the globals dict, perhaps), after having optionally run a generator's close() method (if the frame belongs to a generator). If I'm not mistaken, it should allow breaking reference cycles, and remove the need for complex traceback processing, which Twisted currently also does: http://twistedmatrix.com/trac/browser/trunk/twisted/python/failure.py#L89 Note that generator cleanup through the frame has a patch in bpo-17807. |
I'm not snubbing my nose at something that breaks these types of cycles more easily, but I still think that the abstractions offered by the traceback module could be improved. |
On Mon, May 6, 2013 at 6:06 AM, Antoine Pitrou <report@bugs.python.org> wrote:
Sadly that doesn't help with my issue. I applied path gen3.patch and |
It won't. It's just a building block for the change I've proposed here. |
Antoine - I like your idea, but I think it's a separate issue. I agree with Guido that exposing some lower level non-formatting APIs in the traceback module would be helpful. I see Guido's suggestion here as similar to the change we just made in the dis module to expose a dis.get_instructions iterator. That change makes it much easier to work with the disassembler output programmatically, whereas the module was previously too focused on displaying text with a specific format. My current thoughts: define a "TracebackSummary" with the following fields:
stack_summary would be a list of (filename, lineno, functionname) triples as Guido suggested (probably a named tuple) cause and context would be either None or a reference to an appropriate TracebackSummary object Basically, the summary should contain all of the information needed to create the formatted traceback output, without actually doing any formatting (aside from calling repr() on the exception) |
That sounds great, except I'd expect the exception type and str(), since a formatted traceback uses the str() of the exception, not its repr(). Personally I don't think the tuples need to be named, but I won't hold up progress either. :-) |
You may want two pieces of stack: the piece of stack that is above and including the catch point, and the piece of stack that is below it. The former is what gets traditionally printed in tracebacks, but the latter can be useful as well (see bpo-1553375 for a related feature request). |
Ok, I've spinned off the frame clearing suggestion to bpo-17934. |
I've been looking into this as an easy piece to bite off. If I understand Guido correctly, he'd like to defer or suppress the linecache call when getting the tb summary. The problem with deferring is that you need to access f_globals for the loader to work correctly when the module source is a non-file import source. If we keep a reference to f_globals for each line in the traceback, we can defer this to later (ideally we'd just keep the __loader__ value, but that would require changing the linecache interface as well). My inclination would be to have another keyword argument to _extract_tb_or_stack_iter (terse=False or verbose=True - either style works). In terse mode, no source lines would be available, and the formatted output would be the same as if the source wasn't available at all. This would work, although the traceback module is structured so that I'd need to pass it through quite a few wrapped iterator calls. I'm not sure how free a hand I have when it comes to refactoring the internal implementation. I'm not fond of the extractor callbacks - I'd prefer a generator-based approach on the lines of: def _tb_iter(tb, limit):
i = 0
while tb is not None:
if limit is not None and limit < i:
break
yield tb.tb_frame, tb.tb_lineno
tb = tb.tb_next
i += 1
def _extract_frame_iter(frames, terse=False):
...
for frame, lineno in frames:
... |
In terms of how much freedom you have about changing the internal, I'd check how long ago they were changed. "Internal" APIs that have been stable for many versions tend to have illicit external uses -- but internal APIs that were introduced recently (e.g. in 3.2) are usually safe to use -- nobody is going to make too much of a stink if you break their code. As for saving f_globals, if you're going to save an extra pointer anyways, why not just save the frame pointer? |
Or you could add a *new* linecache interface. |
After thinking about it, I decided to defer instead of suppress line fetching. Suppressing would still give a traceback later, but not the same as the original. extract_exception is where the meat is. There's a slight quirk in how context works - if cause is set to something else than the actual exception, context still gets set even though it won't be printed. This is to enable later analysis. However, if you explicitly 'raise Exception from None' neither of them will be set - the formatting code would have issues otherwise. I'm not too happy about the code duplication between the format_exception method and the formatting that works on exceptions directly, but there are plenty of tests to ensure that the output is identical. Feel free to dispute my naming choices. I'm admittedly influenced by twisted. |
I think ExceptionSummary could be the visible API, instead of adding an indirection through a separate global function. |
I was trying to stay with the established pattern of the existing methods. There are two unrelated issues to solve here - deferring linecache access, and the extract_exception functionality. When it comes to deferral, we could wrap each line in a different class than tuple, but this constitutes a public API change (and this is a *very* widely used library). Even changing to a namedtuple would cause a lot of grief, since we'd break lots of doctests, and subclassing tuple is a lot of effort for little benefit. If we only do it for the deferred usage, it would "only" be inconsistent. I suppose we could have a completely separate way of doing things for ExceptionSummary, but again, inconsistent, and I think if one extract_xxx method provides the functionality, users would expect it to be available in the others. For ExceptionSummary, the same summary instance is used more than once if you have a cause set, so object creation had to be separated from the graph population. Either this is done in a constructor function or we would need some obscure tricks with the class kwargs. This way, there's a separation of concerns - the class wraps *one* exception, and the function creates a bunch and links them together as needed. |
YMMV, but I think we should go for inconsistency here. The other APIs This feature request is a good opportunity to design something a little |
For dis, we introduced a new rich bytecode introspection API. Ditto for |
That would likely be the cleanest approach. |
While trying to fix a similar issue for the asyncio project, I wrote the following code: I was not aware that a similar approach (attached traceback2.patch) was already implemented. Asyncio issues: See also the Python issue bpo-20032. |
Belatedly looking at this again: one thing I noticed is that the summary currently loses info if __cause__ is set. It would be better to follow the PEP 409/415 model and report the "__suppress_context__" flag as part of the summary, rather than dropping the context information. That allows the decision on whether or not to show the context information to be deferred to display time, rather than discarding it eagerly. |
I'll put something together. |
w.r.t. a new linecache interface, it looks like we need two attributes from f_globals: __name__ and __loader__, so that we can eventually call __loader__.get_source(name). One small change (to let me focus on traceback) would be to add another kw argument to the existing calls that take module_globals, called e.g. get_source, which would be a simple callable (source = get_source()). For the deferred linecache situation, we'd then create partial(f_globals.__loader__.get_source, f_globals.__name__) and keep that around until we need the source. We could even abstract that out into a function in linecache to keep the knowledge in one place. Another way to tackle this would be to add a new function to linecache that lazily seeds the cache: it would stash the get_source method and the name against the filename, and when getline[s] is called actually invoke it. I think the second way is a bit nicer myself. What do folk think? |
Ok, here's a draft patch for linecache. Next up, poking around the new TB API. |
One thing I noticed while working on this, traceback today stats each included file once per frame per call into e.g. format_list. This might actually account for quite some of the overhead. I'll include an optimisation for this in my new API. http://paste.openstack.org/show/158714/ |
Le 06/02/2015 09:34, Nick Coghlan a écrit :
Using pydoc isn't generally a successful way of discovering an API. It pydoc is useful mostly as a Python-specific "man" command: you know |
dir() will get me TracebackException as a name, help(traceback.TracebackException) will get me its signature. IDEs with autocomplete and signature tooltips will do the same. There is nothing in those usage sequences to say "don't use __init__, use this redundant class method with the same signature instead". I agree providing a "directly from an exception" constructor is essential for cases where you're working with a caught exception at the Python level. I don't agree with the idea of duplicating the required low level API under a different name so it doesn't *look* like a lower level API in the documentation. |
As suggested adding patch from http://bugs.python.org/issue2786 which prints qualnames in function call exceptions. e.g: >>> class A:
... def __init__(self):
... pass
>>> A(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: A.__init__() takes 1 positional argument but 2 were given |
Sorry xonatius I wasn't clear: AFAICT your patch is going to require changes to the traceback tests, and this issue is changing the implementation substantially: I was suggesting that you make sure your patch applies on top of this issue, not that you merge them :) |
Nick, Antoine - I'm now stuck between you. Options going forward:
I could cripple __init__ by switching to __new__, but I think thats massive overcomplication and not needed. I can of course add more text to the pydoc prose saying 'do not use __init__' if that would address Nicks concern. |
Escalating to python-dev for API design feedback is the usual path forward I'll make one more attempt at persuading Antoine here though: the fact that There's a split between the "low level API that exposes implementation |
You can also switch back to a single constructor, taking either an Switching to python-dev would probably be overkill for this. |
I'm idealogically opposed to polymorphic interpretation of args :) Antoine, will you be ok with one __init__ and one classmethod? |
Hey all, great to see this being worked on so diligently for so long. Having worked in this area for a while (at home and at PayPal), we've got a few learnings to share:
For a lightweight traceback wrapper to be concurrency-friendly, we've had to catch KeyErrors, like so: https://github.com/mahmoud/boltons/blob/master/boltons/tbutils.py#L115 It's kind of a blanket approach, but maybe we could make a separate issue and help out with a linecache refresh?
https://github.com/mahmoud/boltons/blob/master/boltons/tbutils.py#L134 Let me know if you've got any questions on that, and keep up the good work! |
Ok, all changes applied, lets see how this looks to folk. |
New changeset 73afda5a4e4c by Robert Collins in branch 'default': |
New changeset 7cea10917f40 by Robert Collins in branch 'default': |
With 7cea10917f40 applied, multiple tests are now failing on most buildbots. For example, on OS X 10.10, test_cgitb test_code_module test_decimal: test_blanks (test.test_cgitb.TestCgitb) ... ok test_banner (test.test_code_module.TestInteractiveConsole) ... ok 1 items had failures: See also, for example, the Debian buildbot where, in addition, test_fractions failed: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.x/builds/140/steps/test/logs/stdio |
Thats really strange, I did a ./python -m test run before committing - the brown bag was due to me running with the patch for 22936 also applied. Looking into the failures reported now. |
Ok, cgitb - its passing a string instead of an exception type into format_exception - something that was never supported - it only works by accident AFAICT, because the old format code was ignoring the etype - it was deriving the type from the value. Thats easy to accomodate in the compat code. |
Also apologies - ned told me on IRC that python -m test -uall is needed, not just -m test. Doh. |
There are failures on buildbot. Examples: ====================================================================== Traceback (most recent call last):
AttributeError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/python/3.x.langa-ubuntu/build/Lib/test/test_code_module.py", line 85, in test_cause_tb
self.console.interact()
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 245, in interact
more = self.push(line)
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 267, in push
more = self.runsource(source, self.filename)
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 75, in runsource
self.runcode(code)
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 95, in runcode
self.showtraceback()
File "/opt/python/3.x.langa-ubuntu/build/Lib/code.py", line 144, in showtraceback
for value, tb in traceback._iter_chain(*ei[1:]):
AttributeError: module 'traceback' has no attribute '_iter_chain' ====================================================================== Traceback (most recent call last):
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_cgitb.py", line 23, in test_html
raise ValueError("Hello World")
ValueError: Hello World
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_cgitb.py", line 27, in test_html
html = cgitb.html(sys.exc_info())
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/cgitb.py", line 189, in html
''.join(traceback.format_exception(etype, evalue, etb)))
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/traceback.py", line 109, in format_exception
etype, value, tb, limit=limit).format(chain=chain))
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/traceback.py", line 433, in __init__
if exc_type and issubclass(exc_type, SyntaxError):
TypeError: issubclass() arg 1 must be a class |
code module is using a _function from within the traceback module to filter out a frame - we can do that with the new interface easily (it filters the first tem). |
The decimal failure is due to _format_final_exc_line now filtering out 'None' as well, because the string captured values of objects leads to None -> 'None' before we do rendering. I think in this case its reasonable to change the behaviour (since None itself was already filtered out, its only the special case of a value of 'None' that worked previously). |
Fixes for buildbots. |
Much better, thanks! |
New changeset 648b35f22b91 by Berker Peksag in branch 'default': |
Looking at the regression now. |
Regression fixed AFAICT, please re-open if not. |
Robert: in bpo-17911-2.patch (and the eventual commit) you added a check for value == 'None' in _format_final_exc_line(). Why was this necessary? I think it is the cause of my bpo-27348 regression: >>> traceback.format_exception(Exception, Exception("One"), None)
['Exception: One\n']
>>> traceback.format_exception(Exception, Exception("None"), None)
['Exception\n'] |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: