Skip to content

Order variables by closeness to executing statement in pure_eval #807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 36 additions & 8 deletions sentry_sdk/integrations/pure_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

import ast

from sentry_sdk import Hub
from sentry_sdk import Hub, serializer
from sentry_sdk._types import MYPY
from sentry_sdk.integrations import Integration, DidNotEnable
from sentry_sdk.scope import add_global_event_processor
from sentry_sdk.utils import walk_exception_chain, iter_stacks

if MYPY:
from typing import Optional, Dict, Any
from typing import Optional, Dict, Any, Tuple, List
from types import FrameType

from sentry_sdk._types import Event, Hint
Expand Down Expand Up @@ -75,7 +75,9 @@ def add_executing_info(event, hint):
continue

for sentry_frame, tb in zip(sentry_frames, tbs):
sentry_frame["vars"].update(pure_eval_frame(tb.tb_frame))
sentry_frame["vars"] = (
pure_eval_frame(tb.tb_frame) or sentry_frame["vars"]
)
return event


Expand All @@ -89,16 +91,42 @@ def pure_eval_frame(frame):
if not statements:
return {}

stmt = list(statements)[0]
scope = stmt = list(statements)[0]
while True:
# Get the parent first in case the original statement is already
# a function definition, e.g. if we're calling a decorator
# In that case we still want the surrounding scope, not that function
stmt = stmt.parent
if isinstance(stmt, (ast.FunctionDef, ast.ClassDef, ast.Module)):
scope = scope.parent
if isinstance(scope, (ast.FunctionDef, ast.ClassDef, ast.Module)):
break

evaluator = pure_eval.Evaluator.from_frame(frame)
expressions = evaluator.interesting_expressions_grouped(stmt)
expressions = evaluator.interesting_expressions_grouped(scope)

def closeness(expression):
# type: (Tuple[List[Any], Any]) -> int
# Prioritise expressions with a node closer to the statement executed
# without being after that statement
# A higher return value is better - the expression will appear
# earlier in the list of values and is less likely to be trimmed
nodes, _value = expression
nodes_before_stmt = [
node for node in nodes if node.first_token.startpos < stmt.last_token.endpos
]
if nodes_before_stmt:
# The position of the last node before or in the statement
return max(node.first_token.startpos for node in nodes_before_stmt)
else:
# The position of the first node after the statement
# Negative means it's always lower priority than nodes that come before
# Less negative means closer to the statement and higher priority
return -min(node.first_token.startpos for node in nodes)

# This adds the first_token and last_token attributes to nodes
atok = source.asttokens()
return {atok.get_text(nodes[0]): value for nodes, value in expressions}

expressions.sort(key=closeness, reverse=True)
return {
atok.get_text(nodes[0]): value
for nodes, value in expressions[: serializer.MAX_DATABAG_BREADTH]
}
71 changes: 64 additions & 7 deletions tests/integrations/pure_eval/test_pure_eval.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import sys
from types import SimpleNamespace

import pytest

from sentry_sdk import capture_exception
from sentry_sdk import capture_exception, serializer
from sentry_sdk.integrations.pure_eval import PureEvalIntegration


Expand All @@ -10,8 +13,27 @@ def test_with_locals_enabled(sentry_init, capture_events, integrations):
events = capture_events()

def foo():
foo.d = {1: 2}
print(foo.d[1] / 0)
namespace = SimpleNamespace()
q = 1
w = 2
e = 3
r = 4
t = 5
y = 6
u = 7
i = 8
o = 9
p = 10
a = 11
s = 12
str((q, w, e, r, t, y, u, i, o, p, a, s)) # use variables for linter
namespace.d = {1: 2}
print(namespace.d[1] / 0)

# Appearances of variables after the main statement don't affect order
print(q)
print(s)
print(events)

try:
foo()
Expand All @@ -28,8 +50,43 @@ def foo():
frame_vars = event["exception"]["values"][0]["stacktrace"]["frames"][-1]["vars"]

if integrations:
assert sorted(frame_vars.keys()) == ["foo", "foo.d", "foo.d[1]"]
assert frame_vars["foo.d"] == {"1": "2"}
assert frame_vars["foo.d[1]"] == "2"
# Values closest to the exception line appear first
# Test this order if possible given the Python version and dict order
expected_keys = [
"namespace",
"namespace.d",
"namespace.d[1]",
"s",
"a",
"p",
"o",
"i",
"u",
"y",
]
if sys.version_info[:2] == (3, 5):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this conditional can't we do assert set(frame_vars.keys()) == set(expected_keys)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you misunderstood. frame_vars.keys() is set-like and so assert frame_vars.keys() == set(expected_keys) would work in all versions. assert list(frame_vars.keys()) == expected_keys is a stronger assertion to make extra sure (when possible) that values are prioritised as expected. The tests are fine as they are, the reason for the failure in 3.5 is that the implementation actually doesn't work and the values are actually wrong.

The problem is that I return OrderedDict above so that the trimming in serialize will keep the first 10 items of the dict, but just before the trimming the OrderedDict gets converted to a dict (see the PR description) which breaks the order in 3.5 and thus the implementation (the wrong values get trimmed). I'm asking if I should just trim the values within the pure_eval integration and forget the OrderedDict, or try to preserve order in serialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear both options should be equivalent right now, but if serialize changes its algorithm and starts trimming things more cleverly in the future, it will be better if it has the full dict of variables in order of priority rather than a dict that was eagerly/preemptively trimmed by pure_eval.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I recommend going for option 3 but avoiding hardcoding the number 10 (instead importing the var). I don't think there's a safe alternative to dict outside of pure_eval context, so I'd like to avoid changes to serializer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if serialize changes its algorithm and starts trimming things more cleverly in the future

I doubt this will happen in serialization, let's assume pure_eval always knows better (seems likely given the amount of extra analysis it does)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

assert frame_vars.keys() == set(expected_keys)
else:
assert list(frame_vars.keys()) == expected_keys
assert frame_vars["namespace.d"] == {"1": "2"}
assert frame_vars["namespace.d[1]"] == "2"
else:
assert sorted(frame_vars.keys()) == ["foo"]
# Without pure_eval, the variables are unpredictable.
# In later versions, those at the top appear first and are thus included
assert frame_vars.keys() <= {
"namespace",
"q",
"w",
"e",
"r",
"t",
"y",
"u",
"i",
"o",
"p",
"a",
"s",
"events",
}
assert len(frame_vars) == serializer.MAX_DATABAG_BREADTH