Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(code-mappings): Add new task to find projects with missing code mappings #40271

Merged
merged 11 commits into from Oct 24, 2022

Conversation

snigdhas
Copy link
Member

Add new task to find projects with missing code mappings.

Spec
WOR-2231

is_python_project, fn = get_filenames(project, event.data)
# Note: this will skip any possible python files in a project if we find a single
# # non-".py" file in the stacktrace. Is this too unforgiving?
if not is_python_project:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for the stacktrace to have non-python files? If so, this will skip the python files we found for the project as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question but in general terms it is not likely.

If it happens, it is because the developer purposely named the file incorrectly [1] or that the project got set up for multiple languages reporting there (sharing the same DSN).

[1]

$ python foo.txt
Traceback (most recent call last):
  File "foo.txt", line 2, in <module>
    raise Exception("bar")
Exception: bar

Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, Snigdha!

There's few changes I want around the nomenclature.
I will review the test file thoroughly once you make these changes and see fix the CI issues.

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
max_retries=0, # if we don't backfill it this time, we'll get it the next time
)
def find_missing_codemappings(**kwargs):
organizations = kwargs.get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a parameter rather than a kwargs?

queue="find_missing_codemappings",
max_retries=0, # if we don't backfill it this time, we'll get it the next time
)
def find_missing_codemappings(**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a docstring with the overall logic of this task?

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
for st in stacktraces:
try:
fn = [frame["filename"] for frame in st["frames"]]
if fn[0].endswith(".py"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love your attention to detail. Well done.

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
snigdhas and others added 2 commits October 20, 2022 09:28
Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>
Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very close. Few more touches.

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved

def test_finds_stacktrace_paths_single_project(self):
self.store_event(
data={
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this and the few instances in the code base fixtures.
I've filed a ticket https://getsentry.atlassian.net/browse/WOR-2312

For the moment, could you make this a global object that you can copy and use in the few store_event calls you have? After copying you can make explicit changes (e.g. obj['data']['stacktrace']['frames'][0]['abs_path'] = 'different_path') or a function that you can pass some parameters like frames.

I personally find placing the structures in the middle of the test make them harder to read and harder or even compare the differences between the objects.

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved
tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
data={
"message": "Kaboom!",
"platform": "python",
"timestamp": iso_format(before_now(days=1)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want a test that has only one event within the 7 day range and another event outside of the 14 days range.

You may already be testing for this but from a quick look I wasn't able to spot it.

FYI I use --cov=src --cov-report=html with pytest when I want to figure out how much of my own code is covered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great tip, thanks!

"function": "handle_set_commits",
"abs_path": "/usr/src/sentry/src/sentry/tasks.py",
"module": "sentry.tasks",
"in_app": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if in_app is always available? If it's set to False we should not include the frame in our analysis.

See the difference in the UI:
image
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a great point - I'm not sure. Let me look into it and fix this in a followup.

tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
Queue("replays.delete_replay", routing_key="replays.delete_replay"),
Queue("counters-0", routing_key="counters-0"),
Queue("triggers-0", routing_key="triggers-0"),
Queue("find-missing-codemappings", routing_key="find-missing-codemappings"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we need to register it:

CeleryQueueRegisteredTest.test

AssertionError: Found tasks with queues that are undefined. These must be defined in settings.CELERY_QUEUES.
  Task Info:
   - Task: sentry.tasks.find_missing_codemappings, Queue: find_missing_codemappings.
assert not [' - Task: sentry.tasks.find_missing_codemappings, Queue: find_missing_codemappings']

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it has to do with find-missing-codemappings vs find_missing_codemappings.

snigdhas and others added 2 commits October 21, 2022 09:52
Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>
Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>
@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
Queue("replays.delete_replay", routing_key="replays.delete_replay"),
Queue("counters-0", routing_key="counters-0"),
Queue("triggers-0", routing_key="triggers-0"),
Queue("find_missing_codemappings", routing_key="find_missing_codemappings"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! No queue test complaints.

Are you going to rename the module name to derive_code_mappings in this PR or another one?

Copy link
Member

@armenzg armenzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Just tackle my questions and handle with code changes if necessary. Well done!

tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved
@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
Queue("replays.delete_replay", routing_key="replays.delete_replay"),
Queue("counters-0", routing_key="counters-0"),
Queue("triggers-0", routing_key="triggers-0"),
Queue("derive_code_mappings", routing_key="derive_code_mappings"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to speak to ops to have them assign workers to this queue

Comment on lines +20 to +27
@instrumented_task( # type: ignore
name="sentry.tasks.derive_code_mappings.identify_stacktrace_paths",
queue="derive_code_mappings",
max_retries=0, # if we don't backfill it this time, we'll get it the next time
)
def identify_stacktrace_paths(
organizations: Optional[List[Organization]] = None,
) -> Mapping[str, Mapping[str, List[str]]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this be called, and how many orgs are we generally likely to pass?

Comment on lines +56 to +59
groups = Group.objects.filter(
project=project, last_seen__gte=timezone.now() - GROUP_ANALYSIS_RANGE
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a significant number of groups, depending on project. If it is too many, you might hit OOM issues.

Since you're just processing one at a time, you could use RangeQuerysetWrapper to keep memory usage bounded.

I'm also not totally sure you need to fetch all groups from this time range. It seems like you mostly just want to sample a few groups and check their stack trace? Another way to do this is to query snuba for events from the last GROUP_ANALYSIS_RANGE and get a distinct list of platforms from them. That would allow this to be done with one query per project

Copy link
Member

@wedamija wedamija Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, even better, you can make use of ProjectPlatform, which is generated by this task

def collect_project_platforms(paginate=1000, **kwargs):
now = timezone.now()
for page_of_project_ids in paginate_project_ids(paginate):
queryset = (
Group.objects.using_replica()
.filter(
last_seen__gte=now - timedelta(days=1),
project_id__in=page_of_project_ids,
platform__isnull=False,
)
.values_list("platform", "project_id")
.distinct()
)
for platform, project_id in queryset:
platform = platform.lower()
if platform not in VALID_PLATFORMS:
continue
ProjectPlatform.objects.create_or_update(
project_id=project_id, platform=platform, values={"last_seen": now}
)
# remove (likely) unused platform associations
ProjectPlatform.objects.filter(last_seen__lte=now - timedelta(days=90)).delete()

Should be fine to just look at this - the platform passed via events is stored on the group, and we typically trust the platform passed by the sdk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think I'm misunderstanding what this is doing, where are these paths being used in general?


all_stacktrace_paths = set()
for group in groups:
event = group.get_latest_event()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this once per group will likely be quite slow since this will be making n+1 queries. I think you should be able to batch these by passing a list of group ids to a snuba query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snigdhas
Copy link
Member Author

Merging this to keep things moving along. This isn't called anywhere yet and I'll fix a few things in followup PRs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants