Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sensitive data from analytics #1563

Merged
merged 5 commits into from
Mar 13, 2023
Merged

Conversation

emilk
Copy link
Member

@emilk emilk commented Mar 11, 2023

I forgot to anonymize the file path in the file:line location of panics, which means we are leaking full paths into our analytics, which can include user names, which is really bad.

We also had the rerun git_branch included (for users building from source), which can also contain sensitive stuff.

All affected analytics event have been deleted, and new ones will be filtered.

A patch-release is coming in

Checklist

@emilk emilk added 🪳 bug Something isn't working 📊 analytics telemetry analytics labels Mar 11, 2023
@Wumpf Wumpf self-requested a review March 13, 2023 14:59
@emilk emilk added the do-not-merge Do not merge this PR label Mar 13, 2023
@emilk emilk removed the do-not-merge Do not merge this PR label Mar 13, 2023
@emilk emilk changed the title Improve panic analytics Remove sensitive data from analytics Mar 13, 2023
@emilk emilk merged commit bc7cbaf into main Mar 13, 2023
@emilk emilk deleted the emilk/better-crash-analytics branch March 13, 2023 15:23
emilk added a commit that referenced this pull request Mar 13, 2023
* Analytics: Anonymize the file path of the location of a panic
* Remove git_branch from analytics
emilk added a commit that referenced this pull request Mar 13, 2023
* Analytics: Anonymize the file path of the location of a panic
* Remove git_branch from analytics
@nikolausWest
Copy link
Member

Summary of issue

  • For users that built Rerun from source, analytics events would include the current git branch name (of the rerun repository)
  • For users that experienced a panic, the file path and line number (of where the panic happened in rerun's code) was included in the analytics event. These paths weren't properly anonymized and could therefore potentially include identifiable information in some case.

Resolution

In addition to fixing the code we have made sure to scrub all analytics databases of any traces of the leaked non-anonymized paths and git branch names. We have also set up server side filters to drop the affected data before ingestion to avoid any of this data getting to our analytics databases in the future.

  • All events related to anonymous user id's that were associated with one or more panic events with non-anonymized file paths have been deleted
  • All events related to anonymous user id's that were associated with a branch name that wasn't created by the core Rerun team have been deleted
  • Add server-side event filtering to drop any similarly effected data in the future (coming in from version 0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📊 analytics telemetry analytics 🪳 bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants