fix: Redact SSO PII before deletion#38425
fix: Redact SSO PII before deletion#38425ktyagiapphelix2u wants to merge 6 commits intoopenedx:masterfrom
Conversation
| social_auth = self.create_social_auth() | ||
|
|
||
| redact_user_social_auth_pii(social_auth) | ||
| redact_user_social_auth_pii(social_auth) |
There was a problem hiding this comment.
@ktyagiapphelix2u may be adding a comment on line#189 will help. "duplicate call to redact user method to validate idempotency"
| """ | ||
| Test that redaction works correctly for multiple SSO providers (Google OAuth and SAML). | ||
| """ | ||
| google_auth = self.create_social_auth( | ||
| provider='google-oauth2', | ||
| uid='google@example.com', | ||
| extra_data={'email': 'google@example.com', 'name': 'Google User'} | ||
| ) | ||
| saml_auth = self.create_social_auth( | ||
| provider='tpa-saml', | ||
| uid='saml@example.com', | ||
| extra_data={'email': 'saml@example.com', 'name': 'SAML User', 'uid': 'saml-uid'} | ||
| ) | ||
|
|
||
| redact_user_social_auth_pii(google_auth) | ||
| redact_user_social_auth_pii(saml_auth) | ||
| google_auth.refresh_from_db() | ||
| saml_auth.refresh_from_db() | ||
|
|
||
| assert google_auth.uid == REDACTED_SOCIAL_AUTH_UID | ||
| assert google_auth.extra_data == {} | ||
| assert saml_auth.uid == REDACTED_SOCIAL_AUTH_UID | ||
| assert saml_auth.extra_data == {} |
There was a problem hiding this comment.
| """ | |
| Test that redaction works correctly for multiple SSO providers (Google OAuth and SAML). | |
| """ | |
| google_auth = self.create_social_auth( | |
| provider='google-oauth2', | |
| uid='google@example.com', | |
| extra_data={'email': 'google@example.com', 'name': 'Google User'} | |
| ) | |
| saml_auth = self.create_social_auth( | |
| provider='tpa-saml', | |
| uid='saml@example.com', | |
| extra_data={'email': 'saml@example.com', 'name': 'SAML User', 'uid': 'saml-uid'} | |
| ) | |
| redact_user_social_auth_pii(google_auth) | |
| redact_user_social_auth_pii(saml_auth) | |
| google_auth.refresh_from_db() | |
| saml_auth.refresh_from_db() | |
| assert google_auth.uid == REDACTED_SOCIAL_AUTH_UID | |
| assert google_auth.extra_data == {} | |
| assert saml_auth.uid == REDACTED_SOCIAL_AUTH_UID | |
| assert saml_auth.extra_data == {} | |
| """ | |
| Test that redaction works correctly for multiple SSO providers. | |
| """ | |
| auths = [ | |
| self.create_social_auth( | |
| provider="google-oauth2", | |
| uid="google@example.com", | |
| extra_data={"email": "google@example.com", "name": "Google User"}, | |
| ), | |
| self.create_social_auth( | |
| provider="tpa-saml", | |
| uid="saml@example.com", | |
| extra_data={"email": "saml@example.com", "name": "SAML User", "uid": "saml-uid"}, | |
| ), | |
| ] | |
| for auth in auths: | |
| redact_user_social_auth_pii(auth) | |
| auth.refresh_from_db() | |
| assert auth.uid == REDACTED_SOCIAL_AUTH_UID | |
| assert auth.extra_data == {} |
There was a problem hiding this comment.
Implemented
|
|
||
| ENABLE_SECONDARY_EMAIL_FEATURE_SWITCH = 'enable_secondary_email_feature' | ||
| LOGGER = logging.getLogger(__name__) | ||
| REDACTED_SOCIAL_AUTH_UID = 'redacted@redacted.com' |
There was a problem hiding this comment.
unique_together = ("provider", "uid") from UserSocialAuth means retiring a second user on the same provider will raise an IntegrityError when both rows get uid = 'redacted@redacted.com'. Use a per-record value instead, e.g. f'redacted_{user_social_auth.pk}@retired.invalid'.
There was a problem hiding this comment.
Thanks before this would be a critical bug,
Before: All retired users with the same SSO provider would get uid = 'redacted@redacted.com', violating the unique_together = ("provider", "uid") constraint.
After: Each retired user gets a unique UID like redacted_123@retired.invalid, preventing database constraint violations while still clearly marking the data as redacted.
| except Exception as e: # pylint: disable=broad-except | ||
| # Log the error but don't prevent the deletion | ||
| logger.exception( | ||
| "Failed to redact PII for UserSocialAuth before deletion: user_id=%s, provider=%s, error=%s", | ||
| instance.user_id, | ||
| instance.provider, | ||
| str(e) |
There was a problem hiding this comment.
Swallowing the exception here means if redaction fails, deletion proceeds with PII intact — exactly what this PR is meant to prevent. Either re-raise after logging, or explicitly document "best-effort" as the intended part.
There was a problem hiding this comment.
Done! I've updated the signal handler to re-raise the exception after logging.
Before:
Redaction fails → Exception logged → Deletion proceeds → PII leaked
After:
Redaction fails → Exception logged → Exception re-raised → Deletion blocked → PII protected
|
|
||
|
|
||
| @skip_unless_lms | ||
| def test_retire_user_redacts_sso_pii_before_deletion(setup_retirement_states): # lint-amnesty, pylint: disable=redefined-outer-name, unused-argument # noqa: F811 |
There was a problem hiding this comment.
This test only checks the record was deleted — it doesn't verify redaction happened before deletion. Use a mock.side_effect on delete_by_user_value to assert the redacted state mid-flow.
There was a problem hiding this comment.
Updated the test Thanks
| # Unlink LMS social auth accounts | ||
| UserSocialAuth.objects.filter(user_id=user.id).delete() | ||
| # Redact and unlink LMS social auth accounts | ||
| for social_auth in UserSocialAuth.objects.filter(user_id=user.id): |
There was a problem hiding this comment.
- In the original PR where we fixed redact before delete, didn't we already establish use of django calls that avoided looping?
- Can we avoid the separate method and just add some hard-coded values? What are all the fields we need to update, and do you have example values that have PII? Just curious. From current
redact_user_social_auth_piiit looks like there are multiple values you are clearing out.
Description
Implements automatic PII redaction for UserSocialAuth records before deletion to prevent personally identifiable information from persisting after records are removed.
Problem
When users unlink SSO accounts or are retired, UserSocialAuth records are deleted from the LMS database. However, sensitive data in fields like uid and extra_data could still persist indirectly (e.g., in logs, backups, or downstream systems), creating a data retention and privacy risk.
Solution
Added redact_user_social_auth_pii() utility function to sanitize sensitive data before deletion
Implemented a Django pre_delete signal handler to automatically redact PII across all deletion paths
Updated the retire_user management command to explicitly invoke redaction prior to deletion
Redacted values:
uid = "redacted@redacted.com"
extra_data = {}
Jira Ticket
https://2u-internal.atlassian.net/browse/BOMS-514