Skip to content

fix: Redact SSO PII before deletion#38425

Open
ktyagiapphelix2u wants to merge 6 commits intoopenedx:masterfrom
ktyagiapphelix2u:ktyagi/SSOPII
Open

fix: Redact SSO PII before deletion#38425
ktyagiapphelix2u wants to merge 6 commits intoopenedx:masterfrom
ktyagiapphelix2u:ktyagi/SSOPII

Conversation

@ktyagiapphelix2u
Copy link
Copy Markdown
Contributor

@ktyagiapphelix2u ktyagiapphelix2u commented Apr 23, 2026

Description

Implements automatic PII redaction for UserSocialAuth records before deletion to prevent personally identifiable information from persisting after records are removed.

Problem

When users unlink SSO accounts or are retired, UserSocialAuth records are deleted from the LMS database. However, sensitive data in fields like uid and extra_data could still persist indirectly (e.g., in logs, backups, or downstream systems), creating a data retention and privacy risk.

Solution

Added redact_user_social_auth_pii() utility function to sanitize sensitive data before deletion
Implemented a Django pre_delete signal handler to automatically redact PII across all deletion paths
Updated the retire_user management command to explicitly invoke redaction prior to deletion
Redacted values:
uid = "redacted@redacted.com"
extra_data = {}

Jira Ticket

https://2u-internal.atlassian.net/browse/BOMS-514

@ktyagiapphelix2u ktyagiapphelix2u marked this pull request as ready for review April 23, 2026 11:29
@ktyagiapphelix2u ktyagiapphelix2u requested a review from a team as a code owner April 23, 2026 11:29
social_auth = self.create_social_auth()

redact_user_social_auth_pii(social_auth)
redact_user_social_auth_pii(social_auth)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktyagiapphelix2u may be adding a comment on line#189 will help. "duplicate call to redact user method to validate idempotency"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment on lines +196 to +218
"""
Test that redaction works correctly for multiple SSO providers (Google OAuth and SAML).
"""
google_auth = self.create_social_auth(
provider='google-oauth2',
uid='google@example.com',
extra_data={'email': 'google@example.com', 'name': 'Google User'}
)
saml_auth = self.create_social_auth(
provider='tpa-saml',
uid='saml@example.com',
extra_data={'email': 'saml@example.com', 'name': 'SAML User', 'uid': 'saml-uid'}
)

redact_user_social_auth_pii(google_auth)
redact_user_social_auth_pii(saml_auth)
google_auth.refresh_from_db()
saml_auth.refresh_from_db()

assert google_auth.uid == REDACTED_SOCIAL_AUTH_UID
assert google_auth.extra_data == {}
assert saml_auth.uid == REDACTED_SOCIAL_AUTH_UID
assert saml_auth.extra_data == {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Test that redaction works correctly for multiple SSO providers (Google OAuth and SAML).
"""
google_auth = self.create_social_auth(
provider='google-oauth2',
uid='google@example.com',
extra_data={'email': 'google@example.com', 'name': 'Google User'}
)
saml_auth = self.create_social_auth(
provider='tpa-saml',
uid='saml@example.com',
extra_data={'email': 'saml@example.com', 'name': 'SAML User', 'uid': 'saml-uid'}
)
redact_user_social_auth_pii(google_auth)
redact_user_social_auth_pii(saml_auth)
google_auth.refresh_from_db()
saml_auth.refresh_from_db()
assert google_auth.uid == REDACTED_SOCIAL_AUTH_UID
assert google_auth.extra_data == {}
assert saml_auth.uid == REDACTED_SOCIAL_AUTH_UID
assert saml_auth.extra_data == {}
"""
Test that redaction works correctly for multiple SSO providers.
"""
auths = [
self.create_social_auth(
provider="google-oauth2",
uid="google@example.com",
extra_data={"email": "google@example.com", "name": "Google User"},
),
self.create_social_auth(
provider="tpa-saml",
uid="saml@example.com",
extra_data={"email": "saml@example.com", "name": "SAML User", "uid": "saml-uid"},
),
]
for auth in auths:
redact_user_social_auth_pii(auth)
auth.refresh_from_db()
assert auth.uid == REDACTED_SOCIAL_AUTH_UID
assert auth.extra_data == {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented


ENABLE_SECONDARY_EMAIL_FEATURE_SWITCH = 'enable_secondary_email_feature'
LOGGER = logging.getLogger(__name__)
REDACTED_SOCIAL_AUTH_UID = 'redacted@redacted.com'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_together = ("provider", "uid") from UserSocialAuth means retiring a second user on the same provider will raise an IntegrityError when both rows get uid = 'redacted@redacted.com'. Use a per-record value instead, e.g. f'redacted_{user_social_auth.pk}@retired.invalid'.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks before this would be a critical bug,

Before: All retired users with the same SSO provider would get uid = 'redacted@redacted.com', violating the unique_together = ("provider", "uid") constraint.

After: Each retired user gets a unique UID like redacted_123@retired.invalid, preventing database constraint violations while still clearly marking the data as redacted.

Comment on lines +43 to +49
except Exception as e: # pylint: disable=broad-except
# Log the error but don't prevent the deletion
logger.exception(
"Failed to redact PII for UserSocialAuth before deletion: user_id=%s, provider=%s, error=%s",
instance.user_id,
instance.provider,
str(e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swallowing the exception here means if redaction fails, deletion proceeds with PII intact — exactly what this PR is meant to prevent. Either re-raise after logging, or explicitly document "best-effort" as the intended part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've updated the signal handler to re-raise the exception after logging.

Before:
Redaction fails → Exception logged → Deletion proceeds → PII leaked

After:
Redaction fails → Exception logged → Exception re-raised → Deletion blocked → PII protected



@skip_unless_lms
def test_retire_user_redacts_sso_pii_before_deletion(setup_retirement_states): # lint-amnesty, pylint: disable=redefined-outer-name, unused-argument # noqa: F811
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test only checks the record was deleted — it doesn't verify redaction happened before deletion. Use a mock.side_effect on delete_by_user_value to assert the redacted state mid-flow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test Thanks

# Unlink LMS social auth accounts
UserSocialAuth.objects.filter(user_id=user.id).delete()
# Redact and unlink LMS social auth accounts
for social_auth in UserSocialAuth.objects.filter(user_id=user.id):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In the original PR where we fixed redact before delete, didn't we already establish use of django calls that avoided looping?
  2. Can we avoid the separate method and just add some hard-coded values? What are all the fields we need to update, and do you have example values that have PII? Just curious. From current redact_user_social_auth_pii it looks like there are multiple values you are clearing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants