Bug 1543097: Convert crash reporting from raven to sentry_sdk #4917

jwhitlock · 2019-04-29T21:14:37Z

Switch our crash reporting library from raven to sentry_sdk 0.7.14. This is a follow-on for PR #4912, which didn't avoid huge test diffs like I planned 😢.

Switching from Raven to Sentry SDK

The new SDK works best when initialized with sentry_sdk.init() near the start of execution.
- In the webapp, this is done in webapp-django/crashstats/settings/base.py, and is very close to the documentation for Django integration.
- For configman-based apps, setup_crash_reporting() is now called in the top-level App in socorro/app/socorro_app.py, in the run method. This allows dropping sentry.sdk handling from various derived apps.
The helper socorro/lib/sentry_client.py has a few changes:
- The SDK uses a Hub to keep track of state, data scopes, and the client, and init sets up a global Hub that is used by default. The function get_client, which initialized a raven client with a DSN, is replaced by get_hub, which expected sentry_sdk.init() has been called, and does much less, but is still a good point for test mocking.
- A new function is_enabled checks if sentry_sdk.init() has been run with a valid DSN. Tests mock this to return True rather than initialize with a fake DSN.
- The function capture_error drops sentry_dsnn as the first argument. It uses is_enabled to decide whether to send the exception to Sentry or log it.
The SDK methods have also changed from raven to sentry_sdk. captureException is capture_exception and takes a named parameter error, captureMessage is capture_message, etc.

Filtering sensitive data from events

The previous Raven filters, added in PR #4357, sanitized the Sentry events. Similar filtering is a requirement for switching to the new code. However, it is hard to verify the filtering implementation because there are no tests for the existing filtering (it is assumed Raven has those tests), the event format has changed significantly with the new SDK (apparently to standardize events across languages), and some collected data varies from development to deployment environments. This PR makes an effort to replicate the Raven filters, and I expect that there will be some additional filtering needed once we see events in the stage deployment environment.

In sentry_sdk, a lot of sensitive data collection is covered by the init parameter send_default_pii. It is disabled by default, and when enabled it selectively adds data:

In the Django integration, it sends user data like an ID, email and username.
The wsgi integration, included in the Django integration, includes the user's IP address, headers with IP address info, and headers for cookies or auth. It will collect cookies for easier display.
None of the default integrations, which are used by processor and other configman-based tools, vary on send_default_pii.

This PR explicitly sets send_default_pii=False, so future devs will know we decided to omit this data rather than unknowingly accepted the default. Skipping cookies does the same work as the Raven filters for keywords sessionid, csrftoken, anoncsrf, and sc.

The SanitizePasswordProcessor added some other keys that look like authorization data, such as password and api_key. Additional code is needed to filter similar data with sentry_sdk, which take a init param before_send to filter events and breadcrumbs.

When analyzing events in the development environment, only the web app appeared to have PII associated with the visitor, so the processing code lives in the web app. This PR adds a SentryProcessor class that runs a sequence of filters against an event, and includes some optional debug logging for development. The code could be trivially expanded to sanitize events from the processor app, or to sanitize breadcrumbs (data gathered during execution that may be useful if an exception occurs).

The current filters are:

Filter SQL query breadcrumbs that include a sensitive column, such as email, username, session_key or tokens_token.key, to avoid exposing PII or security data.
Mask Auth-Token in HTTP headers
Mask csrfmiddlewaretoken in POST data
Mask code and state, used in the OAuth flow, from query strings

Future work

I was unable to get enough data in the development environment to write some planned filters. I hope to return to these with data from stage or production:

Masking sensitive variables in call stacks (none found in dev)
Masking sensitive environment from gunicorn breadcrumbs (not recorded in dev, may be Raven-specific)

This code should be merged at a time when we'll have a couple of working days to gather data in stage and create new filters as needed.

Testing this PR

For my local testing, I added these to my.env:

SENTRY_DSN=https://bad-9c7b6b6bcb7a45d7ae01fe8f60594083@sentry.127.0.0.1.nip.io/0
SENTRY_DEBUG=True
resource.sentry.debug=True

The SENTRY_DSN could be set to send development events to our Sentry collector for stage, which was done in testing for PR #4357, but I found the Sentry debug logging to be sufficient for development of the code.

jwhitlock

To test this, you could add a DSN for stage to the dev environment, and maybe set debug=True in sentry_sdk.init. A similar process to the last PR may work, where we put it in stage for 24 hours to get some cron errors, and raise an exception to test the Django side.

jwhitlock · 2019-04-29T21:21:34Z

socorro/app/socorro_app.py

@@ -206,6 +216,12 @@ def fix_exit_code(code):

            config_manager.log_config(mylogger)

+            # Add version to crash reports
+            version = (revision_data.get('version') or


This is a copy of socorro.lib.revision_data.get_version that doesn't read the JSON from disk again. This may be a mis-optimization. I'm not sure if get_version should take an optional revision_data, or I should just close my eyes and call get_version 🙈.

I went with the optional revision_data.

jwhitlock · 2019-04-29T21:23:23Z

socorro/unittest/app/test_socorro_app.py

@@ -22,8 +22,9 @@ def test_instantiation(self):
        with pytest.raises(NotImplementedError):
            sa.main()

+    @mock.patch('socorro.app.socorro_app.setup_crash_reporting')


This mock avoidssentry_sdk choking when processing a MagicMock as a DSN string.

jwhitlock · 2019-04-29T21:26:00Z

socorro/unittest/processor/test_processor_2015.py

-
+    @patch('socorro.lib.sentry_client.get_hub', side_effect=Exception('fail'))
+    @patch('socorro.lib.sentry_client.is_enabled', return_value=False)
+    def test_rule_error_sentry_disabled(self, is_enabled, mock_get_hub):


test_rule_error used to test both with a blank DSN and a fake DSN. When converting to a mocked is_enabled, it made sense to split the test.

jwhitlock · 2019-04-29T21:27:20Z

webapp-django/crashstats/cron/management/commands/cronrun.py

-                return self.cmd_run_one(options['job'], options['force'], cmd_args)
-            else:
-                return self.cmd_run_all()
-        except Exception:


I dropped this try block because the default handler does what the except block does - log the uncaught exception to Sentry on exit.

willkg

I think we should either filter the crash data or get an ok from @g-k to skip it.

These changes have ramification if they're not correct--we'll miss errors. Given that, I would write a rough test plan to show what we've thought through and surface holes. Then when this deploys to stage, we have a plan we can use to verify it.

socorro/app/socorro_app.py

willkg · 2019-04-29T21:31:30Z

socorro/cron/crontabber_app.py

-        default='',
-        reference_value_from='secrets.sentry',
-        secret=True
-    )


Nice job lifting this up from the apps to the App.

willkg · 2019-04-29T21:40:56Z

webapp-django/crashstats/settings/base.py

+        dsn=SENTRY_DSN,
+        release=SOCORRO_REVISION,
+        send_default_pii=True,
+        integrations=[DjangoIntegration()])


I know we talked about this a bit. I think I'm going to change my mind and suggest we filter the event before sending it along.

This seems pretty straight-forward:

https://docs.sentry.io/error-reporting/configuration/filtering/?platform=python

@g-k is our security person. It's worth having him eyeball this and 👍 / 👎 what we do.

We should try to maintain raven's client side filtering to avoid accidentally collecting PII.

We have audited Sentry for storing PII (part of the reason we self-host it), can filter it server side, and can probably purge PII if we do collect it, but not collecting it in the first place would be best.

Looking at getsentry/sentry-python#211 it looks like there isn't an equivalent for the raven filters and processors and we'd need to handle that in a before_send hook (looks like the raven filters are matching on field names and value regexes https://github.com/getsentry/raven-python/blob/master/raven/processors.py#L165).

jwhitlock pointed out some additional nuance, so I'm going to run this by the rest of the secops team.

Perfect timing on bringing this up right before the secops team meeting! So our consensus was that if we can filter out more PII client side we should do that, but it's also OK to continue collecting the same user fields since crash reports already contain user submitted PII and we need to associate users with their crashes.

And if the new sdk does less filtering we should try to preserve the behavior of raven's filters.

I just chatted with @g-k and he pointed out that web users don't opt-in to webapp crash reports to sentry and so they're not really the same thing as firefox crash reports to socorro.

I've been on Socorro for like 3 years and I don't think I've looked at any of those fields before. Sure, past is not prologue, but I think that's probably a good indicator.

I vote we go the easy route and do set_default_pii=False. If that turns out to be a problem in the future, we can revisit.

OK some things from further discussion with willkg:

"webapp crashes -> sentry are really different than firefox crashes -> socorro" (especially in that they aren't opt-in)

socorro usually goes out of its way to not associate Fx crashes with users (as in it's a legal or data requirement), so that's not a reason to collect PII

willkg can't recall when he last used the PII fields so we can set that flag to false to not collect them

we should re-implement the sanitizers from raven to preserve existing behavior (and upstream the change if sentry is open to it)

willkg · 2019-04-29T21:43:46Z

webapp-django/crashstats/cron/management/commands/cronrun.py

-                return self.cmd_run_one(options['job'], options['force'], cmd_args)
-            else:
-                return self.cmd_run_all()
-        except Exception:


willkg · 2019-04-29T21:45:16Z

socorro/unittest/processor/test_processor_2015.py

-
+    @patch('socorro.lib.sentry_client.get_hub', side_effect=Exception('fail'))
+    @patch('socorro.lib.sentry_client.is_enabled', return_value=False)
+    def test_rule_error_sentry_disabled(self, is_enabled, mock_get_hub):


willkg · 2019-04-29T22:05:32Z

Nice work!

jwhitlock · 2019-04-29T23:08:27Z

I'm guessing another day to write and test a before_send function. To get an idea what we're currently collecting, see the ZeroDivisionError from stage. I may need to send some data from my developer environment to stage with this library for comparison.

willkg · 2019-04-30T00:44:29Z

socorro/lib/revision_data.py

    :returns: string

    """
-    revision_data = get_revision_data()


I don't think the result of get_version will ever change during a run, so we could cache the result in the module and then return that. We do module-level caching in other places, too, so there's some precedent.

It's up to you.

willkg · 2019-04-30T18:52:28Z

I'd feel better if this baked on stage for a bit. I want to push the stuff that's on stage now to prod before we land this. I'm thinking we'll do a prod deploy tomorrow. I'm going to add the DO NOT MERGE label so I don't forget.

jwhitlock · 2019-04-30T22:45:18Z

Today was research and a little code. Setting send_default_pii = True adds some data:

In the Django integration, it sends user data like an ID, email and username.
The wsgi integration, included in the Django integration, includes the user's IP address, headers with IP address info, and headers for cookies or auth. It will collect cookies for easier display.
None of the default integrations, which are used by processor and other configman-based tools, vary on send_default_pii.

We can turn off send_default_pii without losing useful debugging data ✅ .

The raven processors work by trying to automatically filter sensitive data by keywords across data, HTTP headers, and stacktraces. We could potentially port this, but the Sentry dev says:

We don’t have a built-in password processor (mostly because it was never working perfectly for all usecases), but you can modify your events using before_send

This makes me think that copying the raven logic to our before_send and before_breadcrumb methods will potentially send sensitive data. Breadcrumbs, which contain SQL queries and gunicorn data, seem to be the likely candidates in testing. I think that we should add targeted filtering, and use some of the testing techniques from PR #4357, including some time in stage, and reading the event JSON closely.

jwhitlock · 2019-05-01T22:25:33Z

I'll need to get a few more examples of filters before I can get a good API. I thought plain functions would be the simplest thing that could work, but since there is a little configuration and registration, classes with a __call__ method might be better.

Filters to write:

Call stack: Look for sensitive variable names like password, csrf, etc. Might be in locals or calling arguments
Gunicorn logs: Anonomize sensitive headers like REMOTE_ADDR
URL query parameters, such as those passed in OAuth requests

There are probably a few more. I'm getting the low-hanging fruit in the dev environment, and then looking at existing reports for more ideas. I'll push my work in progress for the curious, but no need to review the PR yet.

g-k · 2019-05-02T14:23:13Z

👍 I was assuming we had legal or compliance requirements to use the old SDK filters, but if they are broken or incomplete then we don't have to port that behavior over (especially since we can set send_default_pii=False now and filter server side).

jwhitlock · 2019-05-02T22:46:34Z

I think @Osmose did a careful job of validating the raven-provided filters in PR #4357, and I want to make sure we're not adding data that was previously filtered.

The event format has changed with sentry_sdk, probably to standardize formatting across languages and platforms, which means I can't use existing raven-generated events to determine how to write my filters. The production stack (such as AWS load balancers) also adds some headers that I can't test locally, because I'm not sure how they will show up in the sentry_sdk event format. That's OK, we'll be mostly covered by send_default_pii=False, and I'll be able to quickly add some more rules as needed after capturing an event in staging. But it means I won't be able to filter gunicorn headers in this PR, since I can't see them in development (which means they may not even be present in staging).

That leaves a couple more tasks that I can do in development:

Sanitize POSTs, along with CSRF in form data and headers
Sanitize API token usage
Sanitize the call stack variables

I hope to open for review by Monday.

jwhitlock · 2019-05-03T20:05:05Z

This is ready for review again.

I developed the sanitizers by turning up debugging in my.env:

SENTRY_DSN=https://bad-9c7b6b6bcb7a45d7ae01fe8f60594083@sentry.127.0.0.1.nip.io/0
SENTRY_DEBUG=True
resource.sentry.debug=True

That fake DSN ensured that Sentry events were generated but went nowhere. I then added some raise Exception('here') lines, generated a Sentry event, formatted with jq, looked for trouble, and wrote some code. Rinse and repeat.

The code is overly flexible:

I added the ability to process breadcrumbs, which are added in response to SQL queries, logs, HTTP requests incoming and outgoing, etc. However, you can wait to sanitize these until the event is raised. It is possible that all the code to implement before_breadcrumb should be removed, and the code should focus on before_send.
I added before_send and before_breadcrumb handlers to the processor, but they just have optional debugging, and no filtering. I couldn't find any sensitive data outside of the crash data, which we're not filtering. That could mean all the filtering code belongs in the user-facing webapp, and shouldn't be in socorro/lib/sentry_client.py.

The code is also too specific:

I wrote a filter for each type of data: Headers, query strings, POST data, etc. A more generic filter could take a path in the event object using glom, and then process it in a more generic way. However, each content type has wrinkles, and it was developed incrementally, so I went with a more targeted approach.

I'm not sure if I should tighten up the code, or refactor into some more general functions, or let it be for the first version.

As I mentioned earlier, the event format changed from raven to sentry_sdk, and the existing raven data is different than this data. If the gunicorn request details appear in staging, there may be more filtering to do, to omit IP address data from gunicorn header logs. I suspect there will be one or more rounds of changes.

jwhitlock

After sleeping on it, I want to see if the hint for query breadcrumbs has some options for more targeted sanitation. There may be a parsed SQL available, since Django 2.2 now requires sqlparse, or maybe there is a way to identify sensitive columns in Django itself.

Update: The hint is empty for SQL breadcrumbs, no help there. sqlparse could be used to write a very cool sanitizer, but it is probably too risky for v1 of a crash handler.

Then I want a YAGNI pass, to remove bits of the code that is there for flexibility and future expansion. We can worry about n=2 later, let's get n=1 right.

socorro/app/socorro_app.py

socorro/lib/sentry_client.py

jwhitlock · 2019-05-03T20:11:11Z

webapp-django/crashstats/crashstats/sentry.py

+from socorro.lib.sentry_client import SentryFilter, SentryProcessor, SENTRY_LOG_NAME
+
+
+class CrumbTruncateQuery(SentryFilter):


For SQL queries, I opted to truncate the SQL statement when a sensitive column is mentioned, rather than replace the whole thing with [filtered], or trying to replace the values with [filtered].

webapp-django/crashstats/crashstats/sentry.py

jwhitlock · 2019-05-03T20:20:53Z

webapp-django/crashstats/crashstats/sentry.py

+    """
+    log_name = SENTRY_LOG_NAME if debug else None
+    filters = (
+        EventProcessCrumbs((


This is the important part, which sets up the actual filtering used in the webapp.

jwhitlock · 2019-05-03T20:22:17Z

webapp-django/crashstats/crashstats/tests/test_sentry.py

+class TestCrumbTruncateQueryForPrivacy:
+    """Tests for CrumbTruncateQueryForPrivacy."""
+
+    CASES = (


This CASES pattern is used a lot to take data I saw during development and make test cases out of them. They are used here in the tests for the filter class, and later in the tests for get_before_send to test the webapp configuration.

webapp-django/crashstats/settings/base.py

jwhitlock · 2019-05-03T20:31:15Z

webapp-django/crashstats/crashstats/sentry.py

+        self.keywords = keywords
+        self.reason = reason
+
+    def __repr__(self):


I used __repr__ because I was listening to Raymond Hettinger's talk while I was writing this: https://youtu.be/wf-BqAjZb8M?t=1980

willkg · 2019-05-07T14:14:43Z

I want to think through this, but don't have the head space for it right now. Hopefully later this week.

jwhitlock · 2019-05-08T16:41:42Z

I'm going to take another refactoring pass on this:

Drop breadcrumb parsing, because it appears we won't need it
Generalize the sanitizing algorithm, so that each data type isn't as much of a unique codepath

jwhitlock · 2019-05-10T16:32:50Z

I'm glad I had a chance to refactor this code, I'm a lot more comfortable with it. I've also edited the initial PR comment to cover the additional scope of filtering events for PII and secure information, so that future devs don't have to read all the 40+ comments.

We discussed in the meeting that this code is not urgent, and requires a couple of days to bake in stage. We can defer review and merging until we have that time. The code is in a good state to hang out on the shelf for a while, and I can rebase when needed.

willkg

I have a bunch of comments, but I like this approach and I appreciate that it's really easy to extract and turn into a general purpose library.

If you want to hop on zoom and talk about any of the issues, that might get us to a resolution faster than going back and forth with PR comments. Woot!

socorro/app/socorro_app.py

willkg · 2019-05-13T19:45:50Z

socorro/unittest/processor/test_processor_app.py

@@ -62,9 +62,6 @@ def get_standard_config(self, sentry_dsn=None):
        mocked_companion_process = mock.Mock()
        config.companion_process.companion_class = mock.Mock(return_value=mocked_companion_process)

-        config.sentry = mock.MagicMock()
-        config.sentry.dsn = sentry_dsn


config is a DotDict, so you need to add a namespace to it. You can do that like this:

config.sentry = DotDict() config.sentry.dsn = sentry_dsn

This code was removed in the PR. With raven, the sentry DSN was passed at the time of the exception. With sentry_sdk, Sentry is initialized from this value at run, which is not run for these tests.

My strategy was to mock is_enabled to avoid initialization, but alternatively the tests could call pa.setup_crash_reporting(config, 'fake_version') around the time they call pa._setup_source_and_destination().

webapp-django/crashstats/crashstats/sentry.py

willkg · 2019-05-13T19:52:27Z

webapp-django/crashstats/crashstats/sentry.py

+    filtered at creation, but most crumbs are discarded since an event isn't usually generated, so
+    it makes sense to wait until event sanitization to process it.  The SanitizeBreadcrumbs filter
+    runs breadcrumb filters (that don't need data from the hint) at event processing rather than
+    at crumb creation.


These comments are super helpful. 👍

webapp-django/crashstats/crashstats/sentry.py

willkg · 2019-05-13T22:11:04Z

webapp-django/crashstats/crashstats/sentry.py

+            else:
+                data_out[key] = value
+        if modified:
+            glom.glom(event, glom.Assign(self.section_path, data_out))


This is kind of funny. They added Assign because I asked for it, but I think this is the first use we've had of it.

webapp-django/crashstats/crashstats/sentry.py

webapp-django/crashstats/crashstats/tests/test_sentry.py

willkg · 2019-05-14T11:54:48Z

Oh, I had one other thought. This adds a lot of code and while there are tests, they're tests for known scenarios and don't account for change in sentry-sdk. What happens when this code fails? How would we find out?

Maybe it makes sense to have the before_send in a big try/except block that logs the exception if anything happens? That gets us something in the logs that we can watch after deploys that might affect sentry things. It doesn't pro-actively notify us, though.

Use the standard Django integration of sentry_sdk to replace raven. We're using the send_default_pii=False (the default), to explicitly not collect user data like email addresses, and connection data like IP addresses and cookies.

Switch to the sentry_sdk initialized in settings: * Use the default exception handler for handle * Use sentry_sdk.capture_error() in _run_once

Switch management command depcheck to use the settings-initialized sentry_sdk.

jwhitlock

I've made a number of changes, and there's some more work to do

Exceptions in `before_send`

The sentry code wraps the before_send call:

https://github.com/getsentry/sentry-python/blob/78d17d1847f885fff9164ee21051df2a1c2c65be/sentry_sdk/client.py#L153-L160

I tested this by raising an exception. There is:

A multi-line ERROR-level log from sentry "Internal error in sentry_sdk", followed by the original traceback and the new exception.
A single-line INFO-level log that prints the event
Django's ERROR-level log of the original traceback and the new exception.

The event is not sent to Sentry if before_send raises an exception.

Log level of `before_send` loggers

The suggestion is to change the before_send loggers to log at DEBUG instead of INFO. The default log level is INFO, so DEBUG messages are swallowed. A few ways forward:

Continue logging at INFO, if the SENTRY_DEBUG is set
Always log at DEBUG, and require the developer to switch to DEBUG to see them
Always log at DEBUG, and dynamically add a handler for DEBUG messages if SENTRY_DEBUG is set

Next set of work

Convert to filters that just return the event, not the tuple of the event and if it was modified
Redo logging configuration so that debug_logger.debug() works
Switch the query processing to filter rather than truncate
Indent multi-line SQL test cases
Determine if querystring parsing can be used without corrupting data
Try "production-like" mode, and see if I can get the extended gunicorn breadcrumbs from production

jwhitlock · 2019-05-14T14:54:24Z

socorro/unittest/processor/test_processor_app.py

@@ -62,9 +62,6 @@ def get_standard_config(self, sentry_dsn=None):
        mocked_companion_process = mock.Mock()
        config.companion_process.companion_class = mock.Mock(return_value=mocked_companion_process)

-        config.sentry = mock.MagicMock()
-        config.sentry.dsn = sentry_dsn


This code was removed in the PR. With raven, the sentry DSN was passed at the time of the exception. With sentry_sdk, Sentry is initialized from this value at run, which is not run for these tests.

My strategy was to mock is_enabled to avoid initialization, but alternatively the tests could call pa.setup_crash_reporting(config, 'fake_version') around the time they call pa._setup_source_and_destination().

webapp-django/crashstats/crashstats/sentry.py

webapp-django/crashstats/crashstats/tests/test_sentry.py

willkg · 2019-05-14T20:14:28Z

Exceptions in before_send

What if we wrap the before_send in a try/except that uses markus to send metrics to datadog whenever it fails and then have it re-raise? Something like this:

metrics = markus.get_metrics('sentry')

...

def before_send(whatever):
    try:
        whatever
    except Exception:
        metrics.incr('before_send_exception')
        raise

If markus is configured at that point, it'll send something and we can see it in datadog.

Does that sound like a good idea? I'm not sure how else we can get signals out.

Log level of before_send loggers

I suggested we switch it to debug because I thought we were looking at debug level in the local dev environment. Now that I know it's not logging at debug level, I take back my comment--let's leave it as what you have since if someone has debug mode enabled and it seems onerous and confusing to require multiple actions to get debugging to work.

willkg · 2019-05-14T20:30:27Z

#4917 (comment)

It's more readable to me if we adjust where the parens are. It's a tuple of three elements with the second element being a really long string. I think this is more readable:

        # Select a user by email
        (
            'email',
            (
                'SELECT "auth_user"."is_active" FROM "auth_user" WHERE'
                ' UPPER("auth_user"."email"::text) = UPPER(\'username@example.com\')'
            ),
            (
                '...'
            )
        )

When I was reviewing that code, I had to count parens to figure out where the three elements were demarcated. That seems really prone to error.

jwhitlock · 2019-05-14T21:56:20Z

I think the logging is a little cleaner now. Instead of passing around a logger, there's a module-level logger that gets called when needed, and the logic to "turn on" the messages moves to settings. Take a look and see if you agree.

Incrementing a counter with markus will work to track exceptions. When we make it a library, I don't know if we want to require markus as a dependency. But, I think your code works for v1, and we can worry about the library later.

I'm continuing with migrating the return from (event, modified?) to just modified. Working through this change suggests another. The event is modified in place, and returned to signify that it wasn't dropped. None of our code drops events however, and it seems a misleading that we're modifying the input as well as returning it.

Instead, it seems better to raise an exception when then event (or breadcrumb) should be dropped instead of processed, catch the exception in SentryProcessor, and return None. But, none of the current filters drop events for crumbs, so I think I'm just going to drop the return value, and adjust docstrings to make it clear that the event is modified in place.

willkg · 2019-05-15T13:06:31Z

webapp-django/crashstats/tests/test_sentrylib.py

-        ids=tuple(case[0] for case in CASES))
-    def test_filtered_queries(self, keyword, message, expected):
+        'keyword,sql', CASES.items(),
+        ids=tuple(case for case in CASES))


Does this line up right? Does CASES.items() return the same as iter(CASES)?

In my experience, yes, and that's my understanding from the dictionary view docs: https://docs.python.org/3.6/library/stdtypes.html#dictionary-view-objects

jwhitlock · 2019-05-15T16:14:26Z

My remaining task is to run in production mode, and see if I can get the sensitive breadcrumbs I'm seeing in the raven-generated events in stage. That may add another sanitizer, but I don't expect the current sanitizers to change, so this may be another good place to review.

Update: The event is fundamentally the same in production mode, so no further changes.

I hope to squash all these review commits into the "Filer events for pii/security" commit, because I don't think there is a long-term benefit to all these follow-on commits. If you want me to squash before you review, let me know.

Update 2: I've rebased and squashed the commits

Convert ProcessorApp to use sentry_sdk: * Move crash reporting initialization to the base socorro_app.App * Assume sentry is initialized in sentry_client helpers * Remove now unused Raven library

* Change sentry_client.get_client to get_hub, to better match sentry_sdk, and update the test mocks * Add sentry_client.is_enabled() to check if Sentry has an initialized hub and dsn, and mock to return True in tests * Drop the sentry_dsn parameter from both get_hub and capture_error * Drop local handling of sentry_dsn in each app, defer to App, and simplify test config setup accordingly * Split test_rule_error from test_processor_2015.py into two tests, one with Sentry disabled, one enabled

willkg

The markus bits need to get fixed and I'd make the CASES declarations more readable. This looks good though. After those fixes I think we can land this and test it out on stage.

Do you have a test plan for this?

webapp-django/crashstats/sentrylib.py

webapp-django/crashstats/tests/test_sentrylib.py

jwhitlock · 2019-05-16T15:56:38Z

Fixes applied in follow-on commits. I'd like to squash these as well before merging.

My test plan:

Use "Crash me now" to generate a webapp report in staging, look at the generated event
Wait for a KeyError: 'java_stack_trace_' exception (2 in last 48 hours, might be possible to induce with re-submit), look at the generated event.

There's a good chance that there will be addition scrubbing needed in the deployed environments, based on crumbs that appear in the deployed raven events but not in the development environment with sentry_sdk.

willkg

Looks great! 👍

Sorry this was such a slog--I was kind of learning about the bits as we went along. 😞

Implement before_send on the webapp to clear personal information and security data from the event (exception or message) before sending to Sentry: * Truncate SQL queries when a sensitive column is mentioned * Filter HTTP headers to mask Auth-Token * Filter POST data to mask CSRF tokens * Filter query strings to mask OAuth parameters

jwhitlock · 2019-05-16T16:25:04Z

I was learning on this one too. Thanks for the feedback on coding styles, markus, and the rest.

jwhitlock force-pushed the sentry-sdk-1543097 branch from 9bc0108 to e5c5c23 Compare April 29, 2019 21:16

jwhitlock commented Apr 29, 2019

View reviewed changes

jwhitlock requested a review from willkg April 29, 2019 21:31

willkg reviewed Apr 29, 2019

View reviewed changes

willkg reviewed Apr 30, 2019

View reviewed changes

jwhitlock force-pushed the sentry-sdk-1543097 branch from b66e8f6 to 1820724 Compare April 30, 2019 15:18

willkg added the Do not merge label Apr 30, 2019

jwhitlock force-pushed the sentry-sdk-1543097 branch from 1820724 to 910d893 Compare May 1, 2019 19:38

jwhitlock added the Not ready Not ready for code review label May 1, 2019

willkg removed the Do not merge label May 2, 2019

jwhitlock force-pushed the sentry-sdk-1543097 branch 2 times, most recently from 8d9e7d2 to 7a4b7a7 Compare May 2, 2019 22:34

jwhitlock force-pushed the sentry-sdk-1543097 branch 2 times, most recently from abb1a89 to 0a93e47 Compare May 3, 2019 18:39

jwhitlock added Do not merge and removed Not ready Not ready for code review labels May 3, 2019

jwhitlock commented May 4, 2019

View reviewed changes

jwhitlock added Not ready Not ready for code review and removed Do not merge labels May 8, 2019

jwhitlock force-pushed the sentry-sdk-1543097 branch from e827c02 to b7eb920 Compare May 10, 2019 03:24

jwhitlock force-pushed the sentry-sdk-1543097 branch from 2bddd72 to 394eace Compare May 10, 2019 15:29

jwhitlock added Do not merge No rush and removed Not ready Not ready for code review labels May 10, 2019

willkg requested changes May 13, 2019

View reviewed changes

jwhitlock added 3 commits May 14, 2019 11:07

bug 1543097: Convert webapp to sentry_sdk 0.7.14

e6e988a

Use the standard Django integration of sentry_sdk to replace raven. We're using the send_default_pii=False (the default), to explicitly not collect user data like email addresses, and connection data like IP addresses and cookies.

bug 1543097: Switch mgmt cmd cronrun to sentry_sdk

a732dd8

Switch to the sentry_sdk initialized in settings: * Use the default exception handler for handle * Use sentry_sdk.capture_error() in _run_once

bug 1543097: Switch cmd depcheck to sentry_sdk

98e98fa

Switch management command depcheck to use the settings-initialized sentry_sdk.

jwhitlock force-pushed the sentry-sdk-1543097 branch from 394eace to 2552a15 Compare May 14, 2019 16:07

jwhitlock commented May 14, 2019

View reviewed changes

willkg reviewed May 15, 2019

View reviewed changes

jwhitlock added 2 commits May 16, 2019 09:06

bug 1543097: Convert processor to sentry_sdk

3bc2d34

Convert ProcessorApp to use sentry_sdk: * Move crash reporting initialization to the base socorro_app.App * Assume sentry is initialized in sentry_client helpers * Remove now unused Raven library

jwhitlock force-pushed the sentry-sdk-1543097 branch from 26070fb to 9d63a75 Compare May 16, 2019 14:08

willkg reviewed May 16, 2019

View reviewed changes

willkg approved these changes May 16, 2019

View reviewed changes

jwhitlock force-pushed the sentry-sdk-1543097 branch from a6cb593 to 1162b62 Compare May 16, 2019 16:11

jwhitlock merged commit a9c3c23 into mozilla-services:master May 16, 2019

jwhitlock deleted the sentry-sdk-1543097 branch May 16, 2019 16:25

jwhitlock mentioned this pull request Sep 9, 2019

Convert from raven to sentry-sdk mozilla/ichnaea#882

Open

jwhitlock mentioned this pull request Apr 20, 2021

Add Sentry middleware, rely on server-side filtering mozilla-it/ctms-api#145

Merged

		from socorro.lib.sentry_client import SentryFilter, SentryProcessor, SENTRY_LOG_NAME


		class CrumbTruncateQuery(SentryFilter):

Bug 1543097: Convert crash reporting from raven to sentry_sdk #4917

Bug 1543097: Convert crash reporting from raven to sentry_sdk #4917

Conversation

jwhitlock commented Apr 29, 2019 • edited

Switching from Raven to Sentry SDK

Filtering sensitive data from events

Future work

Testing this PR

jwhitlock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhitlock Apr 29, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg commented Apr 29, 2019

jwhitlock commented Apr 29, 2019

Choose a reason for hiding this comment

willkg commented Apr 30, 2019

jwhitlock commented Apr 30, 2019

jwhitlock commented May 1, 2019 • edited

g-k commented May 2, 2019

jwhitlock commented May 2, 2019

jwhitlock commented May 3, 2019

jwhitlock left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg commented May 7, 2019

jwhitlock commented May 8, 2019

jwhitlock commented May 10, 2019

willkg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willkg commented May 14, 2019

jwhitlock left a comment • edited

Choose a reason for hiding this comment

Exceptions in before_send

Log level of before_send loggers

Next set of work

Choose a reason for hiding this comment

willkg commented May 14, 2019 • edited

willkg commented May 14, 2019

jwhitlock commented May 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwhitlock commented May 15, 2019 • edited

willkg left a comment

Choose a reason for hiding this comment

jwhitlock commented May 16, 2019

willkg left a comment

Choose a reason for hiding this comment

jwhitlock commented May 16, 2019

jwhitlock commented Apr 29, 2019 •

edited

jwhitlock Apr 29, 2019 •

edited

jwhitlock commented May 1, 2019 •

edited

jwhitlock left a comment •

edited

jwhitlock left a comment •

edited

Exceptions in `before_send`

Log level of `before_send` loggers

willkg commented May 14, 2019 •

edited

jwhitlock commented May 15, 2019 •

edited