feat(code-mappings): Add new task to find projects with missing code mappings #40271

snigdhas · 2022-10-19T17:31:19Z

Add new task to find projects with missing code mappings.

Spec
WOR-2231

src/sentry/tasks/find_missing_codemappings.py

snigdhas · 2022-10-19T19:34:17Z

src/sentry/tasks/find_missing_codemappings.py

+        is_python_project, fn = get_filenames(project, event.data)
+        # Note: this will skip any possible python files in a project if we find a single
+        # #  non-".py" file in the stacktrace. Is this too unforgiving?
+        if not is_python_project:


Is it possible for the stacktrace to have non-python files? If so, this will skip the python files we found for the project as well.

That's a good question but in general terms it is not likely.

If it happens, it is because the developer purposely named the file incorrectly [1] or that the project got set up for multiple languages reporting there (sharing the same DSN).

[1]

$ python foo.txt Traceback (most recent call last): File "foo.txt", line 2, in <module> raise Exception("bar") Exception: bar

armenzg

Good job, Snigdha!

There's few changes I want around the nomenclature.
I will review the test file thoroughly once you make these changes and see fix the CI issues.

src/sentry/tasks/find_missing_codemappings.py

armenzg · 2022-10-20T12:21:47Z

src/sentry/tasks/find_missing_codemappings.py

+    max_retries=0,  # if we don't backfill it this time, we'll get it the next time
+)
+def find_missing_codemappings(**kwargs):
+    organizations = kwargs.get(


Can this be a parameter rather than a kwargs?

armenzg · 2022-10-20T12:22:17Z

src/sentry/tasks/find_missing_codemappings.py

+    queue="find_missing_codemappings",
+    max_retries=0,  # if we don't backfill it this time, we'll get it the next time
+)
+def find_missing_codemappings(**kwargs):


Could you please add a docstring with the overall logic of this task?

src/sentry/tasks/find_missing_codemappings.py

armenzg · 2022-10-20T12:57:41Z

src/sentry/tasks/find_missing_codemappings.py

+    for st in stacktraces:
+        try:
+            fn = [frame["filename"] for frame in st["frames"]]
+            if fn[0].endswith(".py"):


I love your attention to detail. Well done.

src/sentry/tasks/find_missing_codemappings.py

Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>

armenzg

This is very close. Few more touches.

src/sentry/tasks/find_missing_codemappings.py

armenzg · 2022-10-21T13:35:24Z

tests/sentry/tasks/test_find_missing_codemappings.py

+
+    def test_finds_stacktrace_paths_single_project(self):
+        self.store_event(
+            data={


We should make this and the few instances in the code base fixtures.
I've filed a ticket https://getsentry.atlassian.net/browse/WOR-2312

For the moment, could you make this a global object that you can copy and use in the few store_event calls you have? After copying you can make explicit changes (e.g. obj['data']['stacktrace']['frames'][0]['abs_path'] = 'different_path') or a function that you can pass some parameters like frames.

I personally find placing the structures in the middle of the test make them harder to read and harder or even compare the differences between the objects.

src/sentry/tasks/find_missing_codemappings.py

tests/sentry/tasks/test_find_missing_codemappings.py

armenzg · 2022-10-21T13:45:23Z

tests/sentry/tasks/test_find_missing_codemappings.py

+            data={
+                "message": "Kaboom!",
+                "platform": "python",
+                "timestamp": iso_format(before_now(days=1)),


You may want a test that has only one event within the 7 day range and another event outside of the 14 days range.

You may already be testing for this but from a quick look I wasn't able to spot it.

FYI I use --cov=src --cov-report=html with pytest when I want to figure out how much of my own code is covered.

great tip, thanks!

armenzg · 2022-10-21T13:55:31Z

tests/sentry/tasks/test_find_missing_codemappings.py

+                            "function": "handle_set_commits",
+                            "abs_path": "/usr/src/sentry/src/sentry/tasks.py",
+                            "module": "sentry.tasks",
+                            "in_app": False,


Do you know if in_app is always available? If it's set to False we should not include the frame in our analysis.

See the difference in the UI:

ah that's a great point - I'm not sure. Let me look into it and fix this in a followup.

tests/sentry/tasks/test_find_missing_codemappings.py

armenzg · 2022-10-21T14:02:24Z

src/sentry/conf/server.py

@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
    Queue("replays.delete_replay", routing_key="replays.delete_replay"),
    Queue("counters-0", routing_key="counters-0"),
    Queue("triggers-0", routing_key="triggers-0"),
+    Queue("find-missing-codemappings", routing_key="find-missing-codemappings"),


It seems we need to register it:

CeleryQueueRegisteredTest.test AssertionError: Found tasks with queues that are undefined. These must be defined in settings.CELERY_QUEUES. Task Info: - Task: sentry.tasks.find_missing_codemappings, Queue: find_missing_codemappings. assert not [' - Task: sentry.tasks.find_missing_codemappings, Queue: find_missing_codemappings']

Actually I think it has to do with find-missing-codemappings vs find_missing_codemappings.

Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>

armenzg · 2022-10-21T19:41:36Z

src/sentry/conf/server.py

@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
    Queue("replays.delete_replay", routing_key="replays.delete_replay"),
    Queue("counters-0", routing_key="counters-0"),
    Queue("triggers-0", routing_key="triggers-0"),
+    Queue("find_missing_codemappings", routing_key="find_missing_codemappings"),


Great! No queue test complaints.

Are you going to rename the module name to derive_code_mappings in this PR or another one?

armenzg

🎉

Just tackle my questions and handle with code changes if necessary. Well done!

tests/sentry/tasks/test_find_missing_codemappings.py

wedamija · 2022-10-21T21:25:23Z

src/sentry/conf/server.py

@@ -684,6 +684,7 @@ def SOCIAL_AUTH_DEFAULT_USERNAME():
    Queue("replays.delete_replay", routing_key="replays.delete_replay"),
    Queue("counters-0", routing_key="counters-0"),
    Queue("triggers-0", routing_key="triggers-0"),
+    Queue("derive_code_mappings", routing_key="derive_code_mappings"),


You'll need to speak to ops to have them assign workers to this queue

wedamija · 2022-10-21T21:26:44Z

src/sentry/tasks/derive_code_mappings.py

+@instrumented_task(  # type: ignore
+    name="sentry.tasks.derive_code_mappings.identify_stacktrace_paths",
+    queue="derive_code_mappings",
+    max_retries=0,  # if we don't backfill it this time, we'll get it the next time
+)
+def identify_stacktrace_paths(
+    organizations: Optional[List[Organization]] = None,
+) -> Mapping[str, Mapping[str, List[str]]]:


How will this be called, and how many orgs are we generally likely to pass?

wedamija · 2022-10-21T21:28:22Z

src/sentry/tasks/derive_code_mappings.py

+    groups = Group.objects.filter(
+        project=project, last_seen__gte=timezone.now() - GROUP_ANALYSIS_RANGE
+    )
+


This could be a significant number of groups, depending on project. If it is too many, you might hit OOM issues.

Since you're just processing one at a time, you could use RangeQuerysetWrapper to keep memory usage bounded.

I'm also not totally sure you need to fetch all groups from this time range. It seems like you mostly just want to sample a few groups and check their stack trace? Another way to do this is to query snuba for events from the last GROUP_ANALYSIS_RANGE and get a distinct list of platforms from them. That would allow this to be done with one query per project

Actually, even better, you can make use of ProjectPlatform, which is generated by this task

sentry/src/sentry/tasks/collect_project_platforms.py

Lines 24 to 48 in fe07466

def collect_project_platforms(paginate=1000, **kwargs):

now = timezone.now()

for page_of_project_ids in paginate_project_ids(paginate):

queryset = (

Group.objects.using_replica()

.filter(

last_seen__gte=now - timedelta(days=1),

project_id__in=page_of_project_ids,

platform__isnull=False,

)

.values_list("platform", "project_id")

.distinct()

)

for platform, project_id in queryset:

platform = platform.lower()

if platform not in VALID_PLATFORMS:

continue

ProjectPlatform.objects.create_or_update(

project_id=project_id, platform=platform, values={"last_seen": now}

)

# remove (likely) unused platform associations

ProjectPlatform.objects.filter(last_seen__lte=now - timedelta(days=90)).delete()

Should be fine to just look at this - the platform passed via events is stored on the group, and we typically trust the platform passed by the sdk

Actually I think I'm misunderstanding what this is doing, where are these paths being used in general?

wedamija · 2022-10-21T21:29:16Z

src/sentry/tasks/derive_code_mappings.py

+
+    all_stacktrace_paths = set()
+    for group in groups:
+        event = group.get_latest_event()


Doing this once per group will likely be quite slow since this will be making n+1 queries. I think you should be able to batch these by passing a list of group ids to a snuba query.

Filed https://getsentry.atlassian.net/browse/WOR-2319 for this

snigdhas · 2022-10-24T19:51:41Z

Merging this to keep things moving along. This isn't called anywhere yet and I'll fix a few things in followup PRs.

Fix for #40271 and SENTRY-WC8

vercel bot deployed to Preview – sentry October 19, 2022 17:31 View deployment

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 19, 2022

vercel bot deployed to Preview – storybook October 19, 2022 17:32 View deployment

snigdhas commented Oct 19, 2022

View reviewed changes

src/sentry/tasks/find_missing_codemappings.py Outdated Show resolved Hide resolved

snigdhas commented Oct 19, 2022

View reviewed changes

Add new task to find missing codemappings

c25d0c8

snigdhas force-pushed the snigdha/script branch from 3a37c4d to c25d0c8 Compare October 19, 2022 19:34

snigdhas marked this pull request as ready for review October 19, 2022 19:35

snigdhas requested a review from armenzg October 19, 2022 19:35

vercel bot deployed to Preview – sentry October 19, 2022 19:37 View deployment

vercel bot deployed to Preview – storybook October 19, 2022 19:37 View deployment

snigdhas requested a review from a team October 20, 2022 00:00

armenzg requested changes Oct 20, 2022

View reviewed changes

snigdhas and others added 2 commits October 20, 2022 09:28

Apply suggestions from code review

bce98dd

Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>

style(lint): Auto commit lint changes

86190e8

vercel bot deployed to Preview – sentry October 20, 2022 16:33 View deployment

vercel bot deployed to Preview – storybook October 20, 2022 16:34 View deployment

Snigdha Sharma added 2 commits October 20, 2022 13:36

Add typing and fix variable names

708d261

Add docstring comments

e3b6684

snigdhas requested a review from a team as a code owner October 20, 2022 20:37

vercel bot deployed to Preview – storybook October 20, 2022 20:40 View deployment

vercel bot deployed to Preview – sentry October 20, 2022 20:40 View deployment

Define new task

82bd51c

vercel bot deployed to Preview – sentry October 20, 2022 20:52 View deployment

vercel bot deployed to Preview – storybook October 20, 2022 20:53 View deployment

armenzg reviewed Oct 21, 2022

View reviewed changes

snigdhas and others added 2 commits October 21, 2022 09:52

Update tests/sentry/tasks/test_find_missing_codemappings.py

d065a56

Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>

Apply suggestions from code review

0e065be

Co-authored-by: Armen Zambrano G. <armenzg@sentry.io>

vercel bot deployed to Preview – sentry October 21, 2022 16:58 View deployment

vercel bot deployed to Preview – storybook October 21, 2022 17:00 View deployment

Snigdha Sharma added 2 commits October 21, 2022 11:51

Fix celery queue

e3e86d5

Reduce code duplication in tests

4694a38

vercel bot deployed to Preview – storybook October 21, 2022 18:54 View deployment

vercel bot deployed to Preview – sentry October 21, 2022 18:55 View deployment

armenzg self-requested a review October 21, 2022 19:40

armenzg reviewed Oct 21, 2022

View reviewed changes

armenzg approved these changes Oct 21, 2022

View reviewed changes

tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved

tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved

tests/sentry/tasks/test_find_missing_codemappings.py Outdated Show resolved Hide resolved

Rename files

8d1a0e1

vercel bot deployed to Preview – sentry October 21, 2022 20:11 View deployment

vercel bot deployed to Preview – storybook October 21, 2022 20:11 View deployment

wedamija reviewed Oct 21, 2022

View reviewed changes

snigdhas merged commit cb4dee2 into master Oct 24, 2022

snigdhas deleted the snigdha/script branch October 24, 2022 19:51

snigdhas mentioned this pull request Oct 24, 2022

feat(code-mappings): Use project platform to find Python projects #40465

Merged

snigdhas requested a review from a team October 24, 2022 20:20

armenzg added this to the Automatic code mappings for Python/Github projects milestone Oct 25, 2022

armenzg assigned snigdhas Oct 27, 2022

armenzg added a commit that referenced this pull request Oct 28, 2022

fix(code_mappings): Use organization ID consistently

555b430

Fix for #40271 and SENTRY-WC8

armenzg mentioned this pull request Oct 28, 2022

fix(code_mappings): Use organization ID consistently #40729

Merged

armenzg added a commit that referenced this pull request Oct 28, 2022

fix(code_mappings): Use organization ID consistently (#40729)

cf0cdb3

Fix for #40271 and SENTRY-WC8

priscilawebdev pushed a commit that referenced this pull request Nov 2, 2022

fix(code_mappings): Use organization ID consistently (#40729)

82cfa73

Fix for #40271 and SENTRY-WC8

github-actions bot locked and limited conversation to collaborators Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(code-mappings): Add new task to find projects with missing code mappings #40271

feat(code-mappings): Add new task to find projects with missing code mappings #40271

snigdhas commented Oct 19, 2022

snigdhas Oct 19, 2022

armenzg Oct 20, 2022

armenzg left a comment

armenzg Oct 20, 2022

armenzg Oct 20, 2022

armenzg Oct 20, 2022

armenzg left a comment

armenzg Oct 21, 2022

armenzg Oct 21, 2022

snigdhas Oct 21, 2022

armenzg Oct 21, 2022

snigdhas Oct 21, 2022

armenzg Oct 21, 2022

armenzg Oct 21, 2022

armenzg Oct 21, 2022

armenzg left a comment

wedamija Oct 21, 2022

wedamija Oct 21, 2022

wedamija Oct 21, 2022

wedamija Oct 21, 2022 •

edited

wedamija Oct 21, 2022

wedamija Oct 21, 2022

snigdhas Oct 24, 2022

snigdhas commented Oct 24, 2022

	def collect_project_platforms(paginate=1000, **kwargs):
	now = timezone.now()

	for page_of_project_ids in paginate_project_ids(paginate):
	queryset = (
	Group.objects.using_replica()
	.filter(
	last_seen__gte=now - timedelta(days=1),
	project_id__in=page_of_project_ids,
	platform__isnull=False,
	)
	.values_list("platform", "project_id")
	.distinct()
	)

	for platform, project_id in queryset:
	platform = platform.lower()
	if platform not in VALID_PLATFORMS:
	continue
	ProjectPlatform.objects.create_or_update(
	project_id=project_id, platform=platform, values={"last_seen": now}
	)

	# remove (likely) unused platform associations
	ProjectPlatform.objects.filter(last_seen__lte=now - timedelta(days=90)).delete()

feat(code-mappings): Add new task to find projects with missing code mappings #40271

feat(code-mappings): Add new task to find projects with missing code mappings #40271

Conversation

snigdhas commented Oct 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armenzg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armenzg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armenzg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wedamija Oct 21, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snigdhas commented Oct 24, 2022

wedamija Oct 21, 2022 •

edited